Statistics with R using Atmospheric Gases: Part 1 The Background to Mauna Loa CO2 Measurement

Atmospheric Gases

Anyone who begins a program of statistics for geoscience will discover that sunspots and variable starts are favoured by most authors. Soon it becomes apparent that Earth data requires a different, often more complex approach.

First, Earth data is more detailed and more plentiful. And second, Earth geoscience data is often associated with a multitude of associated physical variables as well as economic and social data.

To begin my study of atmospheric gases, I chose CO2, partly because of its inherent interest for geoscience, but also because there are many associations between CO2, other physical data, and socio-economic data.

The R programming Language

Although I used a spreadsheet to pre-process data, I have started to use the R language to do statistical analysis, because R is a language and environment specifically for statistical computing and graphics.

The language is supported by the Free Software Foundation via GNU. Further information: R language

ESRL,  NOAA, and the Carbon Cycle

ESRL is the Earth System Research Laboratory of the U.S. National Oceanic & Atmospheric Administration (ESRL). Among the many research themes of ESRL is the Carbon Cycle.

Carbon, as used in the term carbon cycle refers to all of the organic and non-organic compounds involved in the generation and storage of carbon dioxide. Study of the carbon cycle has traditionally included physics, chemistry and biology.

Today, study of the carbon cycle has expanded to include law, economics, philosophy, political science, sociology and psychology. However, I aim to confine these blogs to physics, chemistry, biology, and those social factors, such as economics, that drive the variation in CO2.

The importance of carbon dioxide is that photosynthesis by plants is the basis for most life on Earth on land and in the ocean. Although water vapour is the most important Greenhouse Gas, interest in carbon dioxide has grown since Arrhenius (in 1896) calculated the effect of CO2 in modulating the heating of the Earth by solar energy, a second more direct effect of CO2 in the evolution of plant and animal life on Earth.

Carbon Dioxide (CO2) Measurements

ESRL maintains a global network to monitor Greenhouse Gases, including CO2. The Mauna Loa Observatory in Hawaii is the most famous of all stations in the global network.

Mauna Loa Observatory

The elevation of the observing site is 3397 meters (11,141 ft), far enough removed from local effects so that the observations are interpreted to represent well-mixed global density of CO2.

Mauna Loa CO2 Data

ESRL provides the Mauna Loa CO2 data in several versions. I chose the weekly data, downloaded from the ftp address provided.

I decided to begin with the CO2 data collected at Mauna Loa, because the Mauna Loa CO2 measurements are the most widely used CO2 series. The monthly series dates back to 1958 and the weekly series to 1974. The weekly data file created on May 6, 2015 has weekly averages from May 19, 1974 to April 2015, updated every week with delay of a few weeks to allow preliminary validation.

Missing Data

Geoscience data usually has missing data owing to instrumental problems or to contamination by the environment. The Mauna Loa data updated to May 6, 2015 spanned 2137 weeks of which 17 weekly entries (0.8%)  indicated missing values, a 99.2% recording rate.

The ESRL web site provides monthly data also with missing values together with a second column that contains estimates for the missing values. However, the weekly data do not show estimates for the missing values.

This is not a setback. It’s an opportunity to develop and test an estimation methodology, which I explain and demonstrate in Part 2 of this article.

Part 2 follows: Estimating Missing Values in the Context of Modeling Mauna Loa CO2