Statistics with R Using Atmospheric Gases: Part 4 Reviewing the Scientific Literature


Literature Review of the Mauna Loa CO2 Series

I deliberately explored the data before delving into the literature. But even so, I found it difficult to set aside general knowledge acquired from study of geoscience.

Pieter Tans of the ESRL provides references to the two most important seminal papers.

Related papers

The literature is substantial, especially since the measurement of atmospheric gases now covers several more species of gas collected by a global network of observatories. Charles Keeling was a leader in the field for many years.

There is now a substantial literature based on data series from the observatories in the global network, useful to give perspective to other oceanic and land influences in addition to North America and northern Europe.

Google Scholar is a virtual Who’s Who in the study of CO2 worldwide.

Lessons from Thoning, Tans and Komhyr 1989

The literature is extensive, too extensive to review here. I conclude that all my exploratory observations were discovered by Keeling and others more than 40 years ago. The techniques of separating the trend from the annual cycle are various and complex.

The record of CO2 at Mauna Loa is basically a combination of three signals: a long-term trend, a non sinusoidal yearly cycle, and short-term variations from several days to several weeks that are due to local and regional influences on the CO2 concentration. (Thoning, Tans  and Komhyr, 1989)

The analysis must therefore be complex as illustrated by the methodology used to decompose the data:

The curves used for selecting data and calculating monthly and annual means were obtained using a procedure described by Thoning et al. [1989]. Briefly, the curves are a combination of a quadratic fit to the trend and a fit of sines and cosines to the annual cycle and its first three harmonics. The residuals from these fits are then digitally filtered with a filter having a full width half maximum cutoff at 40 days to remove high-frequency variations. The results of the filtering are then added to the fitted curves. At this point the residual standard deviation of the points from the curve is calculated, and points lying more than +3 [sigma] from the curve are flagged as not representative of background or regionally well-mixed conditions. The procedure is repeated on the unflagged values until no more points are flagged.(Conway, Tans, Waterman, and Thoning, 1994)

This paper describes an approach using Fourier analysis, followed by filtering in the frequency domain and then reversing the process to convert the smoothed data back to the time domain. To discover something that has not already been discovered seems a daunting task, the reason Einstein said, “If at first, the idea is not absurd, then there is no hope for it”.

I do have an absurd idea, a couple of absurd ideas, in fact. The most absurd of my ideas is to use the annual minima and maxima to estimate the trend. My ideas are inspired by comments made by Thoning, Tans, and Komhyr in 1989.

It can be seen from Figure 8 that the annual cycle has the same basic shape from year to year, although with some small but significant variations. For example, the peaks of the cycles can vary from a sharp point to a more rounded shape…. The mean peak-to-peak amplitude for the 12 years from 1974 to 1985 was 6.77 ppm, with a standard deviation about the mean of 0.32 ppm.

Enting [1987] found a correlation between the peak heights for each spring and the following fall for SIO [Scripps Institution of Oceanography] monthly mean data from 1960 to 1981. Low-amplitude peaks in the spring were followed by low-amplitude troughs in the following fall. He did not find any correlation between the fall troughs and the following spring peak. If we plot the absolute values of the maximum and minimum values for the seasonal cycle (Table 3) in a manner analagous to that of Enting, we find a correlation opposite to that stated by Enting. We see no correlation between the size of the peaks and troughs in the same year (correlation coefficient = 0, Figure 11a), but we do find a correlation for the size of the fall troughs followed by the spring peak…(pp.8559-8556).

The dates at which the minima of the annual cycle occur are more consistent than the dates of the maxima. The dates at which the seasonal cycle crosses the trend line are also more consistent for the drawdowns in July than for the increases in January.  Peterson et al. [1986] found a similar consistency for the continuous CO 2 measurements at Barrow, Alaska. This suggests that the forces that drive the summer CO 2 drawdown in the northern hemisphere are stronger and more regular than any interannual variability in CO 2 but that during the winter and spring the release of CO 2 by the biosphere and atmospheric transport are more variable in time and more easily affected by regional or hemispheric variations in CO 2. This can also be seen in Figure 4, where there tend to be larger and more frequent excursions from the filtered curve during the first half of the year than in the latter half.

Enting, I. G., The interannual variation in the seasonal cycle of carbon dioxide concentration at Mauna Loa, J. Geophys. Res., 92, 5497-5504, 1987.

In my opinion, with 25 years more data, it is time to revisit both Enting and Thoning, Tans and Komhyr.

Besides, the peaks and troughs are intrinsic to the underlying physical and biological processes. From a certain point of view, the historical and continuing anthropogenic emission of CO2, (the long-term trend) is a nuisance because it complicates the task of estimating the natural sources and sinks. It seems to me that improving the estimation of the trend would contribute to improving estimates of the variation in the cyclical changes in the sources and sinks. At least, that seems to me a good place to start.

The SIO and MLO CO2 Data Series Compared

Thoning and Tans also discussed the Scripps CO2 (SIO) data series and how some of the SIO data was used to fill gaps in the Mauna Loa Observatory (MLO) series. The SIO weekly data extends from March 19 1958 to the present. Therefore, at least in principle, there is data from this locale for 57 years. Inspection of the two data series revealed the following:

  • The MLO series (from May 1974) has more regular intervals of 7 days than the SIO series (from March 1958)
  • The MLO series has fewer gaps than the SIO series for the period May 1974 to the present.
  • The mean absolute difference between the series since May 1974 is about 0.5%.
  • The differences between the series are probably randomly distributed, something to be verified.
  • A reasonable working hypothesis is that both series are drawn from the same population with normal sampling error but the MLO series has benefited from marginally better control.
  • Both data series use a calendar year of 365 days. The MLO-week runs from Monday to Sunday and the SIO-week from Sunday to Saturday. Thus, the two date series can be synchronized by aligning observations with one day difference. (I am not certain, but I think there may be only 12 hours difference, between midnight and noon,)

Note: The year adopted by SIO and MLO is close to the tropical year. The International Union of Pure and Applied Chemistry and the International Union of Geological Sciences have jointly recommended using the length of the tropical year in the year 2000, approximately 365.24219 days.

The Gregorian Calendar has 365 regular days, but with the leap day has 365.2425 days. The difference of 3 days in 10,000 years is not the problem in aligning the weekly data. Rather the number of weeks per year is the problem.

Some years have 53 weeks and to analyze the series, it is convenient to drop the 53rd week. But a 52-week year has only 364 days, whereas these series are based on 365 days. After 57 years the series would be approximately 5.7 weeks out of synchronization. That’s the problem.

For analysis, my year overlaps two calendar years by aligning the series so that the week of the vernal equinox is week 1. My years is 52 weeks, extending into the following calendar year. For the SIO data this adjustment results in an average departure of the mean time of observation at the vernal equinox of 0.07+/-1.99 days and maximum departure of +/-3 days. This variation in observation time is approximately equal to half of the 7-day period over which the daily observations were averaged. 


I expect that, after appropriate testing and verification, I will be able to  obtain a series of 56 years of nominally continuous weekly observations of CO2. Prior to 1974, 19 gaps in the SIO data must be interpolated to standardize the time to 7 days between observations. Inspection of the SIO data post-1973 suggests that very little interpolation will be needed.

For the initial analysis, I intend to work with the annual maxima and annual minima.  Using the SIO series will permit me to apply a version of the Fourier transform (the FFT) that requires the length of the series to be an even power of 2. This can be achieved by padding the series to 64 years (2 to the power of 6). The standard padding method is to extend the series using zeros.

But I  wonder if this series is so regular that other approaches might be possible. Study of the literature will take up to 3 months. This is an essential step because several papers have passed peer review even though the numerical techniques were faulty to the extent that the authors reported their artifacts as scientific results.

There is not much point in exerting a lot of effort in data preparation and analysis only to cause confusion by using faulty procedures.

In process ….