I’m not here to pretend I’m a climate scientist….or any kind of scientist..I’m not…I’m an accountant, which in my book is at least the third best thing:) A few weeks ago I posted my inaugural climate data article “Climate Data + Database Linked Pivot Tables = Awesome!!” which gave a very brief overview of the data set and included a link to an excel file I had created that gave users the ability to find a climate station near them and chart the temperature over time to see for themselves what their local climate was doing.
Today…I want to dig a little deeper into the data and distribute a new file.
Perfect Climate Data:
Let’s kick this off with a chat about “perfect” data in respects for determining the average temperature of the earth. First off, it doesn’t exist…certainly not in the accounting world, and unfortunately not in the real world either. But if it did, what would it look like? Well, the earth’s area is about 200 million square miles. If we are going to dream big, how about one sensor per 100 (10 x 10) square miles. We would create a nice global grid consisting of about 2 million sensors. Each would be identical and perfectly calibrated of course. Each would take a temperature reading once an hour and feed it to a centralized database. Oh, and while we were at it…it would have been great if someone had set this up….oh, maybe 10-20 thousand years ago. Once we had all of that data, it would be a simple matter of dumping it into a computer far more powerful than mine, and plotting out some spiffy charts in Excel. After all this, the data could speak for itself. Yes…the earth is warming, or no, the earth is not warming….case closed.
Actual Climate Data:
Now, let’s talk about actual climate data. There are ~30,000 sites in NOAA’s database with temperature data. It’s a bit shy of the 2M discussed above, but it’s still a big number and it covers a big chunk of the globe, despite being concentrated in the US…see map here: However, for the data to be useful, we need reliable data….with accurate and consistent readings over long stretches of time. Obviously we don’t have tens of thousands of years of data…..or thousands. The oldest station we have temperature data for is Milan Italy where records start in 1763 and for 245 nearly perfect years….someone using some kind of temperature measuring device recorded a high and low temperature of the day stopping for some reason in 2008.
It does look like the data got a little hairy at the end….jumping about ~10 degrees…let’s zoom in a bit changing the scale to 50-70 instead of 0-70.
This is interesting….looking at the 20 year trailing average we can see that over 200 years there was warming, and cooling, and more warming. Even if we chop off the end which is a little sketchy, one would probably say that from 1763 to ~2000…Milan Italy got a few degrees warmer. Now…from experience…the accountant in me is always questioning the integrity of the data. How was it collected? Who collected it? What instrument? How was it recorded….and how did 200 year old data….initially recorded on what…scrolls…..ultimately get put into a database for me to download…in Fahrenheit no less? This particular data set has only daily high’s and lows. Lows are typically hit in the very early morning…this is easy to capture on modern digital equipment…how was it done from 1763 to oh say 1970? Just the questions that pop into my mind :)….and don’t get me started on corporate accounting data…basically just assume it is all garbage…..but that’s another topic for another day.
That was fun, but let’s just cut to the chase…What I am primarily interested in is the last 50 years. So I looked at the years 1967-2016 and found that there were 184 stations with perfect data…that is…18,263 data points over the 50 year period. All of the rest have missing days…which while a small issue, would possibly tilt the scale one direction or another….a missing summer day would make the site appear cooler, a missing winter day would increase the average a bit. If we relax the requirement a bit….allowing up to 10 missing days over the 50 year period increases the site pool from 184 to 350. We can then march down….if we include stations with 99% data(missing up to 183 days) we have 1,104 stations and if we relax it to 95%we get 2,934 total stations. Let that sink in…For the last 50 years, 1967-2016, we have perfect temperature data for only 184 stations.
Above…I’ve plugged them into a google map which you can find here. Obviously none over the ocean and all but maybe a dozen in the US.Here is the 95% map
Expanding to sites with 95% coverage gets us quite a bit better coverage with ~3k sites, but still we have large missing chunks including most of Africa and South America.
So…what is a climate scientist to do? Well…this is getting long, so we will talk about that next time. For now, let me leave you with a new file.
What I have done is taken the average daily temperatures for our “perfect” 184 stations and squeezed them all into a database linked pivot table. Same concept as before, but now we are only looking at “perfect” sites. The file is large…43MB, but remember it does have over 3 million daily records. I have configured 2 dashboards…one annual…and one I have labeled daily….which I plan to use to track the progress of 2017 as we march through the year. Basically for this one we have added a filter for day of the year, allowing us to in this case isolate the first 59 days of the year(Jan+Feb) and calculate YTD averages. Maybe most importantly…I have inserted 2 blank pivot tables….if you are handy with pivot tables, you can use these to slice and dice those 3M data points any way you want…just be careful 🙂
Here’s your file:Perfect_50_Daily_1967+(3-7-2017)
And here’s your teaser….January and February 2017 marked the warmest start to a year in the last 50 years for our “perfect” data set. No complaints here…The weather in Houston has been phenomenal….feels like we’ve been given 2 extra months of spring….not looking forward to August though :).