Anyone who begins a program of statistics for geoscience will discover that sunspots and variable starts are favoured by most authors. Soon it becomes apparent that Earth data requires a different, often more complex approach.
First, Earth data is more detailed and more plentiful. And second, Earth geoscience data is often associated with a multitude of associated physical variables as well as economic and social data.
To begin my study of atmospheric gases, I chose CO2, partly because of its inherent interest for geoscience, but also because there are many associations between CO2, other physical data, and socio-economic data.
The R programming Language
Although I used a spreadsheet to pre-process data, I have started to use the R language to do statistical analysis, because R is a language and environment specifically for statistical computing and graphics.
The language is supported by the Free Software Foundation via GNU. Further information: R language
ESRL, NOAA, and the Carbon Cycle
Carbon, as used in the term carbon cycle refers to all of the organic and non-organic compounds involved in the generation and storage of carbon dioxide. Study of the carbon cycle has traditionally included physics, chemistry and biology.
Today, study of the carbon cycle has expanded to include law, economics, philosophy, political science, sociology and psychology. However, I aim to confine these blogs to physics, chemistry, biology, and those social factors, such as economics, that drive the variation in CO2.
The importance of carbon dioxide is that photosynthesis by plants is the basis for most life on Earth on land and in the ocean. Although water vapour is the most important Greenhouse Gas, interest in carbon dioxide has grown since Arrhenius (in 1896) calculated the effect of CO2 in modulating the heating of the Earth by solar energy, a second more direct effect of CO2 in the evolution of plant and animal life on Earth.
Carbon Dioxide (CO2) Measurements
Mauna Loa Observatory
The elevation of the observing site is 3397 meters (11,141 ft), far enough removed from local effects so that the observations are interpreted to represent well-mixed global density of CO2.
Mauna Loa CO2 Data
I decided to begin with the CO2 data collected at Mauna Loa, because the Mauna Loa CO2 measurements are the most widely used CO2 series. The monthly series dates back to 1958 and the weekly series to 1974. The weekly data file created on May 6, 2015 has weekly averages from May 19, 1974 to April 2015, updated every week with delay of a few weeks to allow preliminary validation.
Geoscience data usually has missing data owing to instrumental problems or to contamination by the environment. The Mauna Loa data updated to May 6, 2015 spanned 2137 weeks of which 17 weekly entries (0.8%) indicated missing values, a 99.2% recording rate.
The ESRL web site provides monthly data also with missing values together with a second column that contains estimates for the missing values. However, the weekly data do not show estimates for the missing values.
This is not a setback. It’s an opportunity to develop and test an estimation methodology, which I explain and demonstrate in Part 2 of this article.
Part 2 follows: Estimating Missing Values in the Context of Modeling Mauna Loa CO2