Earth as Waterworld Part I: The Movie

Waterworld: The Movie

Waterworld is a 1995 science fiction set a few hundred years in the future at a time when the polar ice caps have melted and the land areas of the world have been submerged under the oceans, a modern version of Noah’s Flood. The film stars Kevin Kostner. When Director Reynolds quit, Kostner took on the job of directing the film.

I enjoyed this film because for the most part I was able to suspend disbelief in the science. Not everyone was able to do that. Kevin Carr was not impressed by the liberties taken with the science.

Kevin Carr commented that in Waterworld, sea level rises above the level of Denver, Colorado, elevation 1609 meters a mile above sea level. Kevin Carr referred to a National Geographic map and said, “The polar ice caps, which rest on land, can affect sea levels if they melted. However, there is only enough ice to raise the overall sea level about 62 meters (or a little more than 200 feet).”

Could the Earth Become a Waterworld?

My thesis here is that Earth is already a waterworld. The National Oceanic and Atmospheric Administration (NOAA) has confirmed what Kevin Carr said and points out that, “The timeline for this event is rather vague: “probably more than 5,000 years, some scientists say”.
NOAA: If All Ice Melted See the tab: About the Science.
Based on the volumes of land ice, the United States Geological Survey (USGS) estimated the potential sea level rise as 80 meters (about 260 feet). In my opinion, the estimate by the USGS is more reliable than that given by National Geographic.

How the Movie Begins

Waterworld opens to show a revolving globe with rising sea level. The link I provide below shows the entire Continental US being submerged, including the Appalachian Mountains. The clip stops at the title scene to avoid infringing copyright.
Link to the movie Waterworld

New York City

New York City is entirely submerged under the ocean and Costner has to dive deep into the sea to reach street level. So where is the Empire State Building?
The real Empire State Building stands on a site 15 meters above sea level, thus an 80 meter rise in sea level would reach 65 meters above ground level. The building has 102 floors  from the ground to the roof and is 380 meters tall. That means Costner should have been able to enter a window on the 17th floor.
Sackets Harbor, near Oswego, New York, is located at the eastern end of Lake Ontario at an elevation of 87 meters. According to the USGS, there is enough water locked up in the ice to raise the world ocean almost to that level and no higher.
However, the elevation of Oswego is low enough that it did not prevent the construction and opening in 1825 of the Erie Canal. The canal passes near Oswego on the way to Buffalo. The elevation of river water level at Albany 1 meter above sea level and the elevation of Buffalo is 183 meters. Buffalo is about the same elevation as the middle floor of the Empire State Building, the 51st floor.

The Statue of Liberty

In his blog, Anthony Watts, a meteorologist, used the National Geographic estimate of sea level rise. Here I will continue to use the USGS estimate of 80 meters. Watts estimated the height of the site of the Statue of Liberty at about 3 meters.
So where would the waterline be if sea level were to rise 77 meters above ground level?
Watts gives measurements from the National Parks Service: ground level to the tip of the torch is 93 meters. The waterline would be 16 meters below the tip of the torch, a little below the level of Liberty’s chin.
Watts makes a point about the timeline, “How long will it take to reach the NatGeo waterline in the cover photo?”
Watts estimated over 23,000 years at the present rate of sea level rise at New York and then said, “A new ice age will likely be well underway then, dropping sea levels. The water would never get there.”
(This is because a new ice age would again lock up sea water in ice caps. In the last 2.5 million years there have been about 50 periods with advancing glaciers, most recently at intervals of about 100,000 years.
Following the assumption that all polar ice could melt and using 80 meters instead of 65 meters, I estimate the time to melting of the ice caps at about 30,000 years. If sea level rises twice as fast in the future, that would give about 15,000 years.

What, Me Worry?

So whatever assumptions we use, the timeline for melting of the polar icecaps is over 10,000 years.
On the web page cited above, NOAA advised teachers that, “Students may find the one-shot 216-foot rise either too large and too distant in time to bother about, or apocalyptic.”
(“Apocalyptic” refers to the Apocalypse, featuring four bogeymen on horses. Such catastrophic events were used by the ancients before science fiction was invented.)
NOAA and Watts agree that the issue is the timeline. Watts makes the additional point that the long-term threat may be the advancing continental glaciers, not sea level rise.

Waterworld’s Reception

The film was not a box-office success in America but international sales and spin-offs probably allowed the studio to break even.
However, I don’t pay attention to box-office success and have seen the film twice. I would watch the film again for the scenery and the action and a few laughs about other science goofs, which I won’t mention in case I spoil the film for some readers who may decide to view it.

My Conclusion

I conclude that no matter how absurd a proposition may be, it can be turned into a learning experience.
Wikipedia says about waterworlds that, “Earth is the only known astronomical object to have bodies of liquid water on its surface, although several exoplanets have been found with the right conditions to support liquid water.” (Exoplanets are planets outside our solar system.)
Waterworld Part 2 will examine the figure of the Earth as a water planet. Part 2 will take a different approach than Wikipedia.


Hemispheric Temperature Anomalies: 1851-1980

Global Temperature Averages by Hemisphere

Climate Research Unit, University of East Anglia

The Climatic Research Unit (CRU) was established in the School of Environmental Sciences (ENV) at the University of East Anglia (UEA) in Norwich in 1972.

The CRU has collected, collated and archived global climate data for over 40 years.

CRU temperature data

Stanley Grotch

In 1987, the American Meteorological Society published a paper by Stanley Grotch of the Lawrence Livermore National Laboratory, University of California, that assessed the robustness of the CRU dataset for land and other datasets.


Three data bases of gridded surface temperature anomalies were used to assess the sensitivity of the average estimated Northern Hemisphere (NH) temperature anomaly to: 1) extreme gridpoint values and 2) zonal band contributions. Over the last 100 year, removal of either the top or bottom 10% of the gridpoint anomalies in any year changes the estimated NH average anomaly by 0.1−0.2°C. Excising extensive zonal bands also produces root-mean-square changes in the estimated NH anomaly of approximately 0.1°C. The estimated NH average anomaly appears to be robust to such perturbations.

Some Considerations Relevant to Computing Average Hemispheric Temperature Anomalies (Grotch, S. L., 1987)

Download PDF

Grotch cited the paper by Jones that described the land and ocean datasets: A new compilation of monthly mean surface air temperature for the Northern Hemisphere for 1851-1984 is presented based on land-based meteorological station data and fixed-position weather ship data.

PDF: JONES, P. D., et al. Northern Hemisphere surface air temperature variations: 1851-1984. Journal of Climate and Applied Meteorology, 1986, 25.2: 161-179.

My comment:

Grotch’s paper claimed that the land (CRU) and ocean (COADS) datasets pass his tests of normality and freedom from bias. His presentation is reasonable.

(Unfortunately, Grotch’s paper seems to be no longer accessible even via Google Scholar.)

However, his Figure 1 shows 26,000 data points for thee Nothern Hemisphere (NH) that range between plus and minus 2 degrees Celsius . By contrast, the signal (the mean temperature) ranges from approximately -0.2 C to +0.2 C over a period of 130 years, a rate of about 0.3 C per century.

The question arises: What is the standard error of the estimate? How much confidence can be placed in the vertical location of a point on the black line when the data range is ten times greater than the signal range? We do not know.

Fig 1

The NH temperature increase from 1885 to 1940 was about 1.0 C, about double the increase over the period 1851-1984. The biggest change in temperature was before 1950, when CO2 began to be emitted at modern levels.This rise in temperature by 1 degree C cannot have been man-made climate change, but merely a natural fluctuation in temperature.

The lowest average temperature in the NH was around 1875. The periods 1875-80 to 1935-40 and 1935-40 to 1995-2000 are both 60 years, approximately the same as the period of the Atlantic Multidecadal Oscillation (AMO). However, we do not know how much of the temperature change on land observed during the last 120 years was a natural response to oceanic oscillations, including the AMO.

Since no warming was observed between 1940 and 1980 and since little or no warming has been observed since about 1995-2000 (apart from El Ninos), the 15-year period from about 1980 to 1995 is our strongest,  and perhaps only, evidence for a long-term trend towards global warming.

But if the warming from 1980 to 1995 was related to the warm phase of the AMO, then we can expect, first a peaking in the cycle, and then a gradual downturn in the AMO, which may have already occurred but has been masked by El Nino events.

The air temperature data recorded at sea (COADS) from the period 1850-1980 revealed no global temperature trend. Grotch did not elaborate on the weaknesses of the COADS air temperature dataset, but the situation is clear from his paper.

Sea Surface Temperature

The temperature of the sea itself is not shown. Until the Argo system began to be deployed in 2000, oceanographers had to rely on ships at sea to measure the temperature of the world ocean. The ocean covers 70% of the Earth’s surface and the thermocline extends to a depth of about two kilometers. Thus, most of the heat in the biosphere is stored in and released from the ocean.

From 2007, several thousand Argo buoys began to record ocean temperatures. Until then, the error bars on ocean temperature estimates were too large relative to the size of the anomalies to inspire much confidence.  Argo buoys

World Fuel Consumption and Global Temperature Anomaly

Nevertheless, the CRU did produce annual estimates of global average temperature using a combination of land and the sea-surface temperature (SST). At the time Grotch prepared his paper world use of fossil fuels was also known.

It would have been possible to raise the issue raised by Klyashtorin and Lyubushin in their paper On the coherence between dynamics of the world fuel consumption and global temperature anomaly.

Fossil fuel use vs temperature change igure12

The graph compares fossil fuel use worldwide over about 140 years including those covered by Grotch’s paper. As I recall from the paper the correlation coefficient between fossil fuel use and global temperature change was given by r = 0.3. The coefficient of determination (r-squared) was about 0.09. This indicates that variance in fossil-fuel use accounted for 9% of the variance in global average temperature.

On this flimsy evidence, in 1992 a vast world-wide initiative was started with the aim of regulating the temperature of the Earth. United Nations Framework Convention on Climate Change.


The publication in an obscure minor journal of the AMS effectively buried Grotch’s paper until Richard Lindzen displayed his graph of CRU temperature in his lecture,

Global Warming, Lysenkoism _ Eugenics Prof Richard Lindzen, at 30:37.





Antarctic Temperature Anomalies

RSS (satellite) Record

The start date of the satellite record is 1979. It appears that Antarctic temperature peaked around before 1982 and has declined slightly since  then.

RSS Temp Antarctica


Satellite temperature is actually a measure of the level of radiative energy of the atmosphere. The anomalies show the changes in the level of energy at the measured wavelengths. As a consequence, the integral of the changes may indicate the direction of change.

RSS Intergral Temp Antarctica

The integral of temperature may not have an obvious physical meaning apart from the vibration energy  of the molecules in the atmosphere. However, in some years higher temperature indicates that energy is being added to the oceans while in other year energy is being lost.

The integral of of temperature is a proxy for the running total of net energy lost and gained that indicates the direction of change more clearly than does the temperature.


Glacier response to North Atlantic climate variability during the Holocene

The Kulusuk Glacier during the Holocene

Kulusuk Lake, Greenland

Kulusuk Lake is located on Kulusuk Island off the east coast of Greenland about the same latitude as northern Iceland. The lake was the subject of a recent paper.

Glacier response to North Atlantic climate variability during the Holocene. N. L. Balascio, W. J. D’Andrea and R. S. Bradley, Clim. Past Discuss., 11, 2009-2036, 2015

The authors present a graph that shows the relative rates of change in the glacier in terms of advance and retreat since 10,000 years ago.

From about 8,000 to 4,000 years ago (during the Holocene Climate Optimum) it was too warm for a glacier to form.

After about 4,000 years ago, a glacier formed and began a series of advances and retreats up to the present as shown in their graph.

I plotted their graph and then wondered what it would look like if I accumulated the advances and subtracted the retreats. The figures are relative and not absolute. Thus the resulting curve shown below does not tell us the total mass of the glacier at any point but merely indicates how the glacier grew in mass from 4000 years ago (2000 BC) to its greatest extent around 1200 years ago (800 AD), then began to lose mass until the Medieval Warm Period around 1000 years BP.

After 1000 years BP the glacier gained mass during the Little Ice Age, which ended around 200 years ago. Since then the Earth’s climate has been getting warmer and the glacier has retreated again. Still the glacier seems to have a long way to go before reaching its extent at the end of the Roman Warming Period around 1500 years ago.



Points to Note

1.Declines in the curve correspond to increasing total ice mass and thus cooling.

2. It can be seen that the shape of the curve for the 2000-year period from 4000 BP (before the present) to 2000 BP is similar to the shape of the curve for the 2000-year period from 2000 BP until the present. The correlation coefficient is 0.78 and R-squared is 0.60 with 21 pairs of data, significant at the 1% level of confidence.

A possible explanation is given by Nicola Scafetta.

Scafetta, Nicola. “Multi-scale harmonic model for solar and climate cyclical variation throughout the Holocene based on Jupiter–Saturn tidal frequencies plus the 11-year solar dynamo cycle.” Journal of Atmospheric and Solar-Terrestrial Physics 80 (2012): 296-311. URL:

3. The glacier gained mass after 4000 years before the present (BP) and then lost it by 38oo BP, a date that corresponds to the Minoan Warm Period. The glacier then gained mass until about 3200 BP a time that corresponds to the end of Mycenaean civilization.

4. The glacier then lost mass during the period 1200 BP to 1500 BP, a period that corresponds to the Roman Warm Period.

5. The glacier gained mass for 300 years after 500 AD (1500 BP) during which Roman civilization declined.

6. From around 700-800 AD (1200 BP) to 1000 AD (1000 BP) the glacier lost mass due to warming. This would correspond to the Medieval Warm Period in Europe.

7. During the period from 1500 BP to 1000 BP the general increase in mass of the glacier was reversed by a significant glacier retreat around 1300 BP, before retreating to its furthest point around 1000 BP. This indicated that cooling was not uniform between 1500 and 1000 BP, but broken by a warm interval around 700-800 AD. 

8. After 1000 AD the glacier gained mass during the period known as the Little Ice Age, a gain that was reversed by warming after about 1800 AD.

9. Since 1800 AD (200 BP) the glacier has recovered mass lost since the year 1000 AD but has not yet recovered the mass lost since 700-800 AD (1350 BP).


It is not certain to what extent changes in glacier mass are driven by changes in temperature or changes in precipitation.

Wind blowing over a glacier can cause sublimation of ice directly to vapour without an intervening liquid stage. The mount Kilimanjaro Glacier is known to respond to wind in this way. Whether or not the Kulusuk Glacier has responded this way is apparently not known. If so, the mass of the glacier would be less than expected from melting caused by temperature change.

Therefore this attempt at correlation between the mass changes of a Greenland glacier and European climate is merely suggestive. Nevertheless, the analysis is consistent with that of H. H. Lamb.

As for the Greenland settlements about 1000 AD, it seems that Greenland may have recovered from the  very severe climate conditions that prevailed 200 years before settlement began in 985 (1030 BP) and by around 1000 AD appeared suitable for settlement.

The settlers could not have known that climate would deteriorate for hundreds of years during the Little Ice Age, just as we do not know if climate will again deteriorate and the glaciers again advance.

The Gleissberg Cycle of Solar Activity: Part 2

Sunspot Groups and the Gleissberg Cycle

In Part 1, sunspot numbers observed from 1750 to 2015 were used to display both the 11-year cycle and the century-long Gleissberg Cycle.

Here I plot the group sunspot numbers from 1610 to 2015 to include the Maunder Minimum, a period when few sunspots were observed, named after Mr. and Mrs. Maunder, a husband and wife team. (Edward Walter Maunder and Annie Scott Dill Maunder studied but did not observe the period now called the Maunder Minimum.)

GN anomaly

As before, the data source is WDC-SILSO, Royal Observatory of Belgium, Brussels. The data is version 2, the new series. The mean value of 3.78 has been subtracted from each data point in order to show the anomalies (departures from the mean (average). The plot shows the anomalies, not the absolute values.

The Maunder Minimum is the period during the 16th century when few observations exceeded the mean. The Dalton Minimum is a similar period of shorter duration from the end of the 18th to early 19th century. Not so obvious is a period centered around 1910 when sunspots were less numerous than usual.

Cumulative Group Anomalies

I accumulated the data series by adding the second value to the first value to get the value for the second data point. To get the third data point, I added the third value to the second data point. The new series will increase as the Group Number increases and will decline as the Group Number declines.

The graphic displays a metric for CUMULATIVE number of sunspot groups.

Cumulative GN

From the start date, the cumulative total increases and then declines until the Maunder Minimum. The cumulative value then rises during the 18th century and then declines until the Dalton Minimum, rises during the first half of the 19th century and then declines until around 1940 and then rises until the end of the 20th century.

I do not see a centennial-scale cycle here. The first cycle seems closer to 200 years, which would make it a Suess cycle (or de Vries cycle). Which is why I think of the Gleissberg and Suess/de Vries Cycle as pseudo-cycles.

The lack of regularity can mean either that the phenomenon is chaotic (like turbulence) or that other factors are operating, such as changes in gravity acting upon the Sun caused by the orbital motions of the planets.

[Ann Maunder noted the apparent association of sunspot appearances with the motions of the inner planets.]

I interpret the long sunspot cycles as indications of increases and decreases of the amount of energy entering the oceans and the ice caps of the Earth. The reason is that the atmosphere and uppermost few meters (yards) of land can store very little energy. I mention “ice” because when ice melts it absorbs a lot of energy as latent heat. So during the upward swing of the Gleissberg Cycle I expect an increase in solar energy entering the oceans in the tropics and melting ice at or near the poles.

The controversial aspect is the claim that changes in solar activity have a significant impact on the climate of the Earth. This question I leave until later in the series.

The model presented here shows a proxy (substitute) for solar activity. There are other proxies for solar activity that can be used alone or as “predictors” for sunspots. These I will discuss in Part 3 of this series.

First I ask the question, “Are counts of sunspot groups consistent with counts of sunspot numbers?” The reason I ask this question is that Group Number (GN) and the Sunspot Number (SSN) are two related but different proxies for solar activity.




The Gleissberg Cycle of Solar Activity: Part 1

What is the Gleissberg Cycle?

The Gleissberg cycle of solar activity spans roughly 80 to 90 years. Because it is not regular, it is probably better described as quasi-periodic or as a pseudo-cycle.

You can easily discover it for yourself.

How to Plot the Gleissberg Cycle

I found the newly corrected sunspot series here:  SILSO sunspot data (by remembering the name “Clette”, a scientist who has been doing research in this field for a long time.)

I entered the monthly data into a spreadsheet and plotted it.

SSN 175-2015

The 11-year cycle is apparent but the Gleissberg Cycle is not.

To display the Gleissberg Cycle, I standardized the values as described below and then cumulated the standardized values. In the following graph, the 11-year cycles appear in clusters of increasing or decreasing numbers of sunspots. A full cycle up and down is defined as a Gleissberg Cycle after Wolfgang Gleissberg.


To obtain Z-scores, I calculated the mean and standard deviation (using the functions AVERAGE() a.nd STDEV()). The Z-scores for every month, is defined as:


The key to the graph is this: After calculating Z-scores I cumulated the scores by recording the first Z-score by itself. Then for the second data point, I added the second Z-scores and so on. At some points the up scores added to the down scores should cancel and the line values should be zero (the line should cross the X-axis).

This method is very crude because the vertical position and the location of the high and low points depends on the starting value. The starting point itself is arbitrary, determined by the point in time when the observers started recording carefully enough and regularly enough for today’s scientists to report sunspot numbers instead of groups of sunspot numbers.

One way to improve would be to add 100 years or so of data going back to the Maunder Minimum, a period of 650 years from 1645 to 1715 when few sunspots were observed.

Because the Maunder Minimum was such a long period with few sunspots the graph would start around zero and a starting point near zero would be roughly correct according to our best current knowledge.


The 11-year solar cycles can be seen, usually 4 to 7 one after the other increasing and decreasing. The peaks occurred around 1790, 1880, and 2005, 90 and 125 years apart. The troughs are 105 years apart.

The average of these intervals (90, 125 and 105) is 107 years, rather longer than the usual value cited of 87 or 88 years.

The cause of the Gleissberg Cycle is speculative. Possibly the process is chaotic, resulting from changes in magnetism deep in the Sun that cause turbulence at the surface. The chaotic nature of the cycle may result from two dynamos within the Sun that are interacting. The Gleissberg Cycle may be displaying the beat frequency of these two dynamos. Alternatively, the gravitational pull of the planets may affect the Sun’s activity.

Next I will try to improve the graphic by adding more data. SILSO provides annual data back to the 17th century.


To analyze the graphic we should keep in mind that it displays a metric for CUMULATIVE number of sunspots.

I interpret this as long-term (century) increases and decreases of the amount of energy entering the oceans and the ice caps of the Earth. The reason is that the atmosphere and uppermost few meters (yards) of land can store very little energy. I mention “ice” because when ice melts it absorbs a lot of energy as latent heat.

So during the upward swing of the Gleissberg Cycle I expect a lot of solar energy entering the oceans in the tropics and a lot of melting ice at or near the poles.

Ocean currents tend to distribute energy from tropical to polar regions. So I would expect to see ice melting in the Arctic Ocean. I would expect to see less ice melting in the Antarctic. Why?

First, the southern oceans are bigger and so the heat is distributed less densely.

Second, the existing ice is more extensive than in the Arctic and therefore the Antarctic ice reflects more light back into space preventing it from converting into heat energy. In effect, the 100-year length of the Gleissberg Cycles  is not long enough to have much impact on the Antarctic ice cap.

Further, ice in the Antarctic is mostly land-based and is therefore less affected by ocean heating. By contrast, ice at the margins that is not grounded on the seafloor would be subject to increased melting.

These theoretical predictions are confirmed by observations.

Finally, I conclude that even if one solar cycle of 11-years does not greatly affect climate, a series of 5 sunspot cycles either up or down may be expected to have a cumulative effect. What makes it so difficult to determine the effect is the fact that ocean currents must be integrally involved because only the oceans can store solar energy for 50 years and then release it.

The oceans therefore act as a low-pass filter for variability in solar output.

There is an interesting paper by Nir Shaviv, Using the Oceans as a Calorimeter to Quantify the Solar Radiative Forcing. He estimated that the variation in ocean temperature within one solar cycle is about 0.1 degree Celsius.

We can see from inspection of the Gleissberg Cycle graphic that from 1935 to 2005 the time span is about 70 years compared to 7 solar cycles, giving a duration of 10 years per cycle. The vertical span of the upward swing is 7 bars, and each cycle averages one vertical bar.

If each cycle from top to bottom represents 0.1 degree Celsius, that would give a span of 0.7 degrees Celsius between 1935 and 2005. This may be compared with the rise in ocean heat content estimated to have raised the temperature by about 0.8 degrees Celsius per decade, 056 degrees C in 70 years (1935-2005). This rough comparison suggests that from 1935 to 2005 the Earth’s oceans should have heated by 0.7 degrees Celsius from excess solar energy alone, whereas the oceans heated only 0.56 degrees Celsius.

This analysis seems to be flawed because that would be too great a rise in temperature. The variation in one 11-year cycle may overlap the variation in the next. Assuming only 50% of the figure estimated by Dr Shaviv (o.o5 degrees Celsius) the heating from solar variation over 7 cycles would be 0.35, 60% of total observed oceanic heating.

This analysis suggests that the percentage of total heating by increased solar activity between 1935 and 2005 was between 60% and 100%.

There remains the possibly that much of the increase in solar energy:

… was reflected back into space by ice and clouds

… melted ice without increasing temperature

… evaporated water at the surface that rose into the upper troposphere and was there emitted into space

OR Sunspots are not so tightly correlated with the Sun’s energy flux as hitherto believed.

Still the Gleissberg Cycle is important because at least some approaches to analyzing Sunspots cycles suggest that solar energy accumulates over many cycles sufficient to affect global warming.

This is a highly controversial subject that I leave until later in this series.

Part 2 will attempt to extend the analysis back to 1610. This will correct some of the bias at the beginning of the series because the zero point on the graph will coincide with a time when there were few sunspots observed.

Statistics with R Using Atmospheric Gases: Part 4 Reviewing the Scientific Literature


Literature Review of the Mauna Loa CO2 Series

I deliberately explored the data before delving into the literature. But even so, I found it difficult to set aside general knowledge acquired from study of geoscience.

Pieter Tans of the ESRL provides references to the two most important seminal papers.

Related papers

The literature is substantial, especially since the measurement of atmospheric gases now covers several more species of gas collected by a global network of observatories. Charles Keeling was a leader in the field for many years.

There is now a substantial literature based on data series from the observatories in the global network, useful to give perspective to other oceanic and land influences in addition to North America and northern Europe.

Google Scholar is a virtual Who’s Who in the study of CO2 worldwide.

Lessons from Thoning, Tans and Komhyr 1989

The literature is extensive, too extensive to review here. I conclude that all my exploratory observations were discovered by Keeling and others more than 40 years ago. The techniques of separating the trend from the annual cycle are various and complex.

The record of CO2 at Mauna Loa is basically a combination of three signals: a long-term trend, a non sinusoidal yearly cycle, and short-term variations from several days to several weeks that are due to local and regional influences on the CO2 concentration. (Thoning, Tans  and Komhyr, 1989)

The analysis must therefore be complex as illustrated by the methodology used to decompose the data:

The curves used for selecting data and calculating monthly and annual means were obtained using a procedure described by Thoning et al. [1989]. Briefly, the curves are a combination of a quadratic fit to the trend and a fit of sines and cosines to the annual cycle and its first three harmonics. The residuals from these fits are then digitally filtered with a filter having a full width half maximum cutoff at 40 days to remove high-frequency variations. The results of the filtering are then added to the fitted curves. At this point the residual standard deviation of the points from the curve is calculated, and points lying more than +3 [sigma] from the curve are flagged as not representative of background or regionally well-mixed conditions. The procedure is repeated on the unflagged values until no more points are flagged.(Conway, Tans, Waterman, and Thoning, 1994)

This paper describes an approach using Fourier analysis, followed by filtering in the frequency domain and then reversing the process to convert the smoothed data back to the time domain. To discover something that has not already been discovered seems a daunting task, the reason Einstein said, “If at first, the idea is not absurd, then there is no hope for it”.

I do have an absurd idea, a couple of absurd ideas, in fact. The most absurd of my ideas is to use the annual minima and maxima to estimate the trend. My ideas are inspired by comments made by Thoning, Tans, and Komhyr in 1989.

It can be seen from Figure 8 that the annual cycle has the same basic shape from year to year, although with some small but significant variations. For example, the peaks of the cycles can vary from a sharp point to a more rounded shape…. The mean peak-to-peak amplitude for the 12 years from 1974 to 1985 was 6.77 ppm, with a standard deviation about the mean of 0.32 ppm.

Enting [1987] found a correlation between the peak heights for each spring and the following fall for SIO [Scripps Institution of Oceanography] monthly mean data from 1960 to 1981. Low-amplitude peaks in the spring were followed by low-amplitude troughs in the following fall. He did not find any correlation between the fall troughs and the following spring peak. If we plot the absolute values of the maximum and minimum values for the seasonal cycle (Table 3) in a manner analagous to that of Enting, we find a correlation opposite to that stated by Enting. We see no correlation between the size of the peaks and troughs in the same year (correlation coefficient = 0, Figure 11a), but we do find a correlation for the size of the fall troughs followed by the spring peak…(pp.8559-8556).

The dates at which the minima of the annual cycle occur are more consistent than the dates of the maxima. The dates at which the seasonal cycle crosses the trend line are also more consistent for the drawdowns in July than for the increases in January.  Peterson et al. [1986] found a similar consistency for the continuous CO 2 measurements at Barrow, Alaska. This suggests that the forces that drive the summer CO 2 drawdown in the northern hemisphere are stronger and more regular than any interannual variability in CO 2 but that during the winter and spring the release of CO 2 by the biosphere and atmospheric transport are more variable in time and more easily affected by regional or hemispheric variations in CO 2. This can also be seen in Figure 4, where there tend to be larger and more frequent excursions from the filtered curve during the first half of the year than in the latter half.

Enting, I. G., The interannual variation in the seasonal cycle of carbon dioxide concentration at Mauna Loa, J. Geophys. Res., 92, 5497-5504, 1987.

In my opinion, with 25 years more data, it is time to revisit both Enting and Thoning, Tans and Komhyr.

Besides, the peaks and troughs are intrinsic to the underlying physical and biological processes. From a certain point of view, the historical and continuing anthropogenic emission of CO2, (the long-term trend) is a nuisance because it complicates the task of estimating the natural sources and sinks. It seems to me that improving the estimation of the trend would contribute to improving estimates of the variation in the cyclical changes in the sources and sinks. At least, that seems to me a good place to start.

The SIO and MLO CO2 Data Series Compared

Thoning and Tans also discussed the Scripps CO2 (SIO) data series and how some of the SIO data was used to fill gaps in the Mauna Loa Observatory (MLO) series. The SIO weekly data extends from March 19 1958 to the present. Therefore, at least in principle, there is data from this locale for 57 years. Inspection of the two data series revealed the following:

  • The MLO series (from May 1974) has more regular intervals of 7 days than the SIO series (from March 1958)
  • The MLO series has fewer gaps than the SIO series for the period May 1974 to the present.
  • The mean absolute difference between the series since May 1974 is about 0.5%.
  • The differences between the series are probably randomly distributed, something to be verified.
  • A reasonable working hypothesis is that both series are drawn from the same population with normal sampling error but the MLO series has benefited from marginally better control.
  • Both data series use a calendar year of 365 days. The MLO-week runs from Monday to Sunday and the SIO-week from Sunday to Saturday. Thus, the two date series can be synchronized by aligning observations with one day difference. (I am not certain, but I think there may be only 12 hours difference, between midnight and noon,)

Note: The year adopted by SIO and MLO is close to the tropical year. The International Union of Pure and Applied Chemistry and the International Union of Geological Sciences have jointly recommended using the length of the tropical year in the year 2000, approximately 365.24219 days.

The Gregorian Calendar has 365 regular days, but with the leap day has 365.2425 days. The difference of 3 days in 10,000 years is not the problem in aligning the weekly data. Rather the number of weeks per year is the problem.

Some years have 53 weeks and to analyze the series, it is convenient to drop the 53rd week. But a 52-week year has only 364 days, whereas these series are based on 365 days. After 57 years the series would be approximately 5.7 weeks out of synchronization. That’s the problem.

For analysis, my year overlaps two calendar years by aligning the series so that the week of the vernal equinox is week 1. My years is 52 weeks, extending into the following calendar year. For the SIO data this adjustment results in an average departure of the mean time of observation at the vernal equinox of 0.07+/-1.99 days and maximum departure of +/-3 days. This variation in observation time is approximately equal to half of the 7-day period over which the daily observations were averaged. 


I expect that, after appropriate testing and verification, I will be able to  obtain a series of 56 years of nominally continuous weekly observations of CO2. Prior to 1974, 19 gaps in the SIO data must be interpolated to standardize the time to 7 days between observations. Inspection of the SIO data post-1973 suggests that very little interpolation will be needed.

For the initial analysis, I intend to work with the annual maxima and annual minima.  Using the SIO series will permit me to apply a version of the Fourier transform (the FFT) that requires the length of the series to be an even power of 2. This can be achieved by padding the series to 64 years (2 to the power of 6). The standard padding method is to extend the series using zeros.

But I  wonder if this series is so regular that other approaches might be possible. Study of the literature will take up to 3 months. This is an essential step because several papers have passed peer review even though the numerical techniques were faulty to the extent that the authors reported their artifacts as scientific results.

There is not much point in exerting a lot of effort in data preparation and analysis only to cause confusion by using faulty procedures.

In process ….




Statistics with R Using Atmospheric Gases: Part 3 Defining the Model for Mauna Loa CO2

Defining the Model

The title seems to imply there is one model for the Mauna Loa CO2 data series. But  there are many, some of which may be better than others in describing the data and some better also in revealing the underlying processes that generate the observations. So “defining the model” actually refers to a process of exploration.

What happened to the naïve approach? Rather than proceed as if I knew nothing about the data, I intend to explore the data series without committing to hypotheses about what physical, biological or social factors generated the data.

Properties of the Mauna Loa CO2 Series

The approach to modeling will depend on whether the modeler is working with the daily, weekly, monthly or annual data. As of 20th May 2015, the were 2139 weeks of data from May 19, 1974 to May 10, 2015. For most years there were 52 weeks, but for 7 years there were 53 weeks.

As of now, we have 52 weeks for 41 years, which is 2132 weeks. Plus for 7 of those 41 years we have one extra week, for a total of 2139 observations.

We begin by reading the data as a vector into a spreadsheet, estimate the missing values and then prepare a graph to view the full years from the first week in 1975 to the last week in 2014.

mlco2 1975-2014

My first impressions are: the annual variations are wave-like and the long-term pattern seems to increase almost as a straight line, with very small fluctuations from year to year.

I count as a discovery that the waves appear to be equally spaced and similar in amplitude. This is something to keep in mind for next steps.

We can see the pattern of CO2 growth and variability is different from a random walk as described by the Penn State Department of Meteorology in the following graphic.

With a random walk, the mean and variance are not constant over time. Such a process, if subject to shocks, does not revert to the mean of its path. The process is not stationary and unless it can be transformed into a stationary process, statistical tools will give spurious results.


By contrast, the process that generates the CO2 data does revert to a mean, but it is the mean value of the long-term trend line. The series is not stationary but it is mean-reverting. As a consequence, the series can be transformed so that statistical tools give valid results. The problem is to find a method to transform the data.

Close-up Visualization

A closer look at the CO2 data reveals details of short-term patterns and noise (random variations).

Keeling 128 weeks

In this segment of  the series there are three peaks and two troughs. It appears that the three peaks could be connected by a straight line.

I count the apparent pattern in peaks as a discovery to be kept in mind.

The seasonal variation seems to follow a saw-tooth pattern, which may or may not result from the upward trend.

I count as a discovery that the wave pattern might be saw-tooth .

Seasonal Pattern: Saw-tooth or Sinusoidal?

By asking this question, I am taking the first step into geophysical theory, but it is a small departure from the naïve approach because it is common knowledge that the Earth has seasons and that the changing of the seasons is related to the Sun as demonstrated by the following graphic from Robert O’Connell course in Astronomy.


I cannot be certain, but I can proceed by using the working hypothesis: the seasonal pattern of the Mauna Loa CO2 series is generated as a sinusoidal series. I reserve judgement on the reason for the saw-toothed appearance.

Caveat: I count as a risk that assuming a sinusoid pattern may bias my results.

Consequences of Sinusoidal Pattern in the Data

The main consequence of assuming the seasonal pattern to be sinusoidal is that statistical software has the facility for using sine and cosine functions to transform data.

Yet, before proceeding to modeling using trigonometric functions, there is a simple way to explore the data. The following figure shows a complete sine function (A1 and B1) and we know that the full cycle covers 360 degrees, equivalent to one revolution of Earth around the Sun.


We know that area A1 + area B1 = 0 because A1 has positive values and B1 has equal and opposite (negative) values. Also we know that moving from right to left by 180 degrees subtracts A1 and adds exactly the same amount at A2.

Intuitively we can see that if we always sum over 360 degrees, then moving one degree from left to right will subtract from A1 exactly the same amount as added at A2.

We conclude that a moving average of 360 degrees (one year) along the entire waveform will always equal zero.

However if we apply a moving average of 52 weeks from 1975 to 2014 the total will not equal zero. If the peaks and the troughs are equally high and low, the sum of weeks 1 to 52 and the sum of weeks 2 to 53 and so on will give the average of the peaks and troughs. This will be an estimate of the tend in CO2 increase from week to week with seasonality removed. The result will be a line that represents the trend in CO2 level.

We cannot proceed to the end of the series because when we pass the first week in 2015 would have only 51 weeks in our sum. We will lose 51 weeks of our series: at the beginning or end, or 25 and 26 weeks at each end. (I ignore the data from part years, 1974 and 2015.)

There are other risks in using this approach. This is the real world that is being measured and not a mathematical function. We cannot be sure that the points of time in the real world, equivalent to A and B, from May to September, are constant in width. Perhaps the time between maximum and minimum varies by a week or so from year to year.

Nor can we be sure that the moving average fits the amplitude of seasonal variations in the real world, although the moving average is adaptive over time periods as long as 52 weeks, over shorter time periods the moving average could generate artifacts.

I have to keep in mind that this analysis will give approximations for exploration only. The fluctuations and anomalies in the data series are so slight that these rough approximations may lead to false conclusions.

The following graphic shows the trend line for years 2012 through 2014. The 53rd weekly data value for 2012 was omitted. This avoided misalignment of the series. By omitting the 53rd week, the date of the vernal equinox (March 20th or 21st) remains almost aligned at about the 12th week.

Precise alignment on the dates of the vernal equinoxes to a precision of +/3 days will be done during pre-processing. The autumn equinoxes are not expected to become aligned. This may be justified on both geophysical and biophysical grounds. Physics alone tells us that solar energy increase at the vernal equinox drives the annual cycle. Thus solar energy decline at the autumnal equinox may allow the system to return to its state at the vernal equinox (relaxation).

In physical terms, this view of the system would would be equivalent to a variably damped harmonic oscillator with variable periodic perturbation. This seems a long way from the naïve approach, but these ideas are simply working hypotheses and not commitments.

Trend line

seasonal variations

The trend line can be superimposed on the data series. Note that the trend line is equivalent to the zero line in the seasonal graph.

trend line plus seasonal.

The seasonal graphic does not look quite right because the series seems to be declining when the expectation was that it would be neither increasing nor declining. Further, when the trend line is superimposed on the original data, it does not appear to be centered vertically, but instead appears to pass through a little below the midpoint between the maximum and minimum values. This visual impression arises mainly because the last year peak and trough are both lower than the preceding two years.

However, the following graphic demonstrates alternative ways of conceptualizing the relationship between seasonal variations and the trend line.

I am using the term “outgassing” to refer to the release of CO2 into the atmosphere from all sources, including combustion and respiration, and the term “drawdown” to refer to the capture of CO2 from the atmosphere by all sinks.

Some sources may also be sinks and vice versa, depending on the season and state of Earth systems.


To explore the data, I assumed that the process of drawdown in summer is balanced by the process of degassing (including combustion) and that therefore the trendline should pass through the mean between the peaks and troughs. I therefore imposed on the data a trendline that fits this hypothesis.

Yet, the data series itself has two intrinsic trend lines defined by the annual maximum and minimum levels of CO2 in May and September respectively. The gaps between  adjacent minima and maxima could provide a measure of the varying width of the channel that contains the data series.

These three measures are non-parametric in the sense that no formal hypothesis needs to be posited concerning the shape of the annual cycle, whether sinusoidal or saw tooth. This may or may not be useful as a way to proceed but is prompted by an article by Graham Griffiths, City University, London, UK and William Schiesser, Lehigh University, USA in Scholarpedia.

About linear and non-linear waves they say:

The term amplification factor is used to represent the change in the magnitude of a solution over time. It can be calculated in either the time domain, by considering solution harmonics, or in the complex frequency domain by taking Fourier transforms.

Some dissipation and dispersion occur naturally in most physical systems described by PDEs. Errors in magnitude are termed dissipation and errors in phase are called dispersion. These terms are defined below. The term amplification factor is used to represent the change in the magnitude of a solution over time. It can be calculated in either the time domain, by considering solution harmonics, or in the complex frequency domain by taking Fourier transforms.

…Generally, [numerical dissipation] results in the higher frequency components being damped more than lower frequency components. The effect of dissipation therefore is that sharp gradients, discontinuities or shocks in the solution tend to be smeared out, thus losing resolution…

…the Fourier components of a wave can be considered to disperse relative to each other. It therefore follows that the effect of a dispersive scheme on a wave composed of different harmonics, will be to deform the wave as it propagates. However the energy contained within the wave is not lost and travels with the group velocity. Generally, this results in higher frequency components traveling at slower speeds than the lower frequency components. The effect of dispersion therefore is that often spurious oscillations or wiggles occur in solutions with sharp gradient, discontinuity or shock effects, usually with high frequency oscillations trailing the particular effect…

The naïve approach has led me to view the CO2 series as representing global phenomena that may be subject to amplification, dissipation and dispersion, not something amenable to analysis by spreadsheet.

The pattern of increase and decrease in CO2 may reveal three things: the net increase year by year (1), and the relative power of sources (2) compared to the sinks (3).

That the trend line passes through the series about midway between the annual peaks and troughs is a reasonable assumption. Even so, this assumption seems to me to be a theoretical construct for estimating the net power of the sources and sinks.

An alternative way to conceptualize the series is to focus on the peaks and the troughs. The annual increase above the bottoms of the troughs shows when the sources have more power than the sinks. The annual decline below the peaks shows when the sinks have more power than the sources.

Phenology is concerned with the timing of biotic activity and events. The amount of inter-annual variation in timing of the peaks, troughs and the time interval between the peaks and troughs, if any, seems to me to be worth investigating.


Manipulation of the data in a spreadsheet is useful for the exploratory stage of the study. In addition to the results reported here, I explored the following:

1. Mean annual CO2 level and its variance. I found that the variance increases with the mean but did not check to see if this was in accordance with Taylor’s power law. I did a rough check and found that the percentage increase in standard deviation scales with the percentage increase in the mean, which points to compound growth in CO2 level over time.

The apparent increase in variance about the trend lend appears to be an artifact of the using a linear model instead of an compound growth rate model.

The correlation between year number and annual average CO2 level for exponential trend seems not be significantly different from the exponential trend. Each regression accounted for over 99% of variance in the CO2 level. The arithmetic trend was slightly concave upwards and the exponential trend slightly concave downwards.  The arithmetic increase in CO2 averaged 1.70 ppmv per year compared to the average seasonal amplitude of about 6.2 ppmv. The seasonal amplitude represented 1.7% of the average CO2 level of 362 ppmv.

2. I examined departures from the smooth trend line and found interesting anomalies. Since the methodology is crude, not a lot can be derived from this graph. The 1998 super El Niño left its mark. However, the role of the eruption at Pinatubo in June 1991 may have been exaggerated, since close inspection shows that the CO2 level was declining before the eruption occurred.

annual anomalies

The most significant lesson from analysis of the anomalies is that the difference between the maximum and minimum anomaly was only 6 ppmv, about equal to the seasonal variation, reduced in this graphic by smoothing to about one ppmv.

CO2 is a trace atmospheric gas that increased on average by one-third of one percent per annum between 1959 and 2014. (The percentage growth rate is identical when calculated as linear or geometric.)

The global level of CO2 as measured at Mauna Loa fluctuates seasonally about the annual trend in a range of about 1.5%. (Geometric average of the range taken from peak to trough, range defined as the ratio of annual maximum divided by annual minimum.)

The annual trend line is subject to external shocks, such as El Niño, but is mean-reverting, so that anomalies from the long-term trend line are about the same order of magnitude as the seasonal variations.

All these characteristics of CO2 data cause statisticians to refer to the series as an example of a deterministic trend that is stationary about its trend line. “‘Deterministic” in science implies that some external (physical and/or social) processes generate the observations and that the scientific problem is to discover what these processes are and how much each “determines” the observed resultant level of CO2 in the atmosphere and its variation temporally and spatially.

Given the location and elevation of the Mauna Loa Observatory, the atmospheric gases may be well-mixed. If so, this would have the effect of a low-pass filter that allows the ENSO signal to remain but may or may not suppress teleconnections with other oceanic basins, the Atlantic and Indian Oceans in particular. By analogy, continental-scale signals from biotic, geologic and human factors may also be filtered out.

What remains appears to be a global CO2 signal representing net emissions (sources minus sinks). Because there is so little land and population in the Southern Hemisphere, land-based signals may be taken to represent a Northern Hemisphere signal.

Decision about how to Proceed

All of these consideration have led me to decide that analysis by spreadsheet is acceptable for inspection of the main characteristics of the data series, exploration, but not for much more than exploration.

What I will need to proceed are tools powerful enough to disentangle temporal, geographical, natural and human factors in CO2 levels and its variability.

At the very least it will be necessary to use trigonometrical functions to fit trendlines and Fourier analysis to detrended CO2 data series, not only the Mauna Loa series, but other series in the global network.

Years ago I converted Peter Bloomfield’s FORTRAN code for Fourier analysis into Microsoft QuickBasic, a programming package that is now obsolete. (Fourier Analysis of Time Series: An Introduction. Wiley, 1976.)

The second edition of Dr. Bloomfield’s book (published in 2000) uses the S-Plus language, a commercial version of S. There are some important differences between R and S, but much of the code for S runs unaltered in R.

A little research about R reveals why the language has become so popular with the scientific community. R is free and so are thousands of addons, packages, functions, code snippets and tutorials. And although R is and was specifically designed for statistical analysis, the language has evolved into a complete system for scientific research and presentation.

Up to this point, my analysis of Mauna Loa Co2 series has relied on spreadsheets. I had intended to infill missing data using the R programming language, but have decided it will be more efficient to do the pre-processing using Excel and the statistical analysis using R.

Part 4 to follow: Literature Review of the Mauna Loa CO2 Series

Statistics with R Using Atmospheric Gases: Part 2 Estimating Missing Values in the Context of Modeling Mauna Loa CO2


In Part 1 of this series, I set out my interest in atmospheric CO2 and R as a programming language for statistical modeling. I explained why I chose the Mauna Loa CO2 series for learning statistics with R.

Finally, I mentioned that during the last 40 years or so, the weekly record of CO2 at Mauna Loa has 17 missing observations, about 0.8% of the total.

The Purpose of this Article

In this part of this series I propose a methodology for estimating the missing values. Here I establish my reasoning and criteria.

I wish to estimate the missing values for convenience in doing the calculations.

1. The method should not introduce artifacts: estimated values should be consistent with the original data.

2. The estimated values should not change the results of the analysis more than would gaps in observations.

A stricter criterion would be that the estimated values should not change the results more than would a data series that had no missing values. This seems impossible, since we do not know the missing values. However, there is one approach to modelling that suggests that many missing values would have no effect on the results: Fourier analysis. The R language is suitable for doing Fourier analysis and so is the ESRL CO2 data.


The method for infilling missing values must respect the data in the context of the model used to do the analysis. If only a small percentage of values are missing, the data series can still be analyzed as a regular time series. With the Mauna Loa CO2 series, only 17 weekly values are missing out of over 2100 weeks. Thus, the very sparsity of the missing values justifies interpolation of the values.

table of missing co2 data

Apart from the exploration phase of the study, I avoid using moving averages because moving averages can introduce artifacts into analysis: I do not want to base my conclusions on artifacts that I generate by data processing.

Thought experiment

With R, I can use regression to run a line through the weekly CO2 data points. I can use the line to revise my estimate of missing data.

Data points exactly on the regression line cannot affect not the shape, location or slope of the line. So estimated missing values that exactly on the line will be consistent with the data series and will not alter the results of the analysis.

Caveat: Provided we do not change the model that we use to define the line that fits the data. This caveat seems obvious and is probably the reason ESRL provides raw data, flagging the missing values but not replacing them with estimates.

I plan to back-cast (predict backwards) the weekly CO2 observations, first by week, and then by estimating values for the mid-points of every month. The mid-month estimates can be used to verify that my model results are consistent with the monthly averages provided by ESRL.

The Rationale

This seems like a lot of work to replicate what ESRL have already done. However, this rational follows the principle: the researcher must know the data before he can do anything with it.

A researcher can come to know the data by exploring it in a naïve way. This is a difficult task in itself, because no Earth scientist or statistician will find it easy to adopt a naïve view in regard to something so familiar as the Keeling Curve.

The ESRL Mauna Loa CO2 graphic is familiar to most scientists and statisticians.

Compatibility of the Scientific Method and Data Interpolation

There is an over-riding imperative: the methodology should be acceptable to both atmospheric physicists and professional statisticians.

I follow the scientific method as advocated by Richard Feynman: First, I guess the missing values. I show how in the graphic below which is from an Excel spreadsheet. And then I test to determine if the guesses are supported by the observations on the whole data series.

Four missing values in one month is the worst gap that we find in the Mauna Loa CO2 record — a whole month missing, indicated by the orange markers in the graphic. This occurred twice, in 1975 and again in 1984.

ESRL provides a five-year graphic for the CO2 series that suggests a way to interpolate the missing values. The regularity of both trend and seasonal change indicates that interpolated values may contain errors that are no greater than the random noise already present in the data. The close-up graph provided by the Scripps Institution of Oceanography supports this.

I estimated the interpolated figures by using the data indicated by blue diamonds, y1 (346.17) and y6 (347.33). I assumed the growth rate r is geometric and calculated values for y2 to y5 by growing the value y1 by rate r = (y6/y1)^(1/5). Thus y2 = y1*(1+r) … to … y5 = y4 * (1+r).

The interpolated values appear to follow a straight line, but only because the curve from y1 to y6 is so slight.

graph of missing values for 1984
Atmospheric Co2 in parts per million. ESRL Mauna Loa data for 31 weeks in 1984
Dates in decimal format

The spreadsheet graphic displays about the same amount of noise as in the close-up graph from the Scripps Institute.

This graphical method is just augmented guessing. Its only virtue is that it allows the analysis to proceed as if there were no missing values. Once the data is modeled, the model can be used to test the goodness of fit of the estimates of missing values and the missing values will be adjusted by using the fitted line produced by the model.

This is why I consider the method for interpolating missing values should be specific to the model used to analyze the data: change the model and the justification for the estimating the missing values may no longer be valid.

Part 3 to follow: Defining the Model for Mauna Loa CO2