Tag Archives: Medium Data

Climate Data + Database Linked Pivot Tables = Awesome!!

I have been tinkering with the NOAA climate data for years now…and am finally ready to start pushing out some files I think you will find both useful and interesting. We’ve all heard a lot about global warming and climate change over the last 20 years, but have you ever wondered how the climate is changing in your hometown? I wondered that exact thing a few years ago when I started playing with the available climate data. The Excel file provided for free below gives you a simple way to find a station near to you and chart the climatological history of that site.

First…a the history….I stumbled across database linked pivot tables back in 2011. I was in an accounting role, working on some massive global account reconciliations where we had some really big problems following an acquisition and new ERP rollout…..among other things the ERP was creating some erroneous entries in ways that we didn’t really understand. I was already well versed in pivot tables, but the data sets we were working with quickly grew to over the 1M records we could paste in an excel worksheet and then pivot…. So we did the obvious thing…we chopped the data up and soon were dealing with lots of huge files and lots of pivot tables….it worked, but not well, and we were crashing excel on a regular basis.

Then two things happened that changed the path of my career. First, one of our IT guy showed me how I could simply link an excel pivot table to a database….which could hold millions (or billions) of records….solving the first problem of how to break the 1M record barrier. Then I stumbled across an article about 64 bit excel 2010…which allowed Excel to use up to 4GB of RAM while the standard 32 bit excel was limited to 2GB. (Starting with Excel 2013 64 bit Excel can use as much memory as your system has available…8…16…64??) 64 bit excel was a phenomenal game changer primarily because it was extraordinarily stable. I had spent a good chunk of my career trying to avoid crashing Excel…but still it crashed on what seemed like a daily basis. After upgrading to 64 bit Excel 2010, I could count the number of crashes I had on one hand, and every single one of those was me knowingly pushing it and my system to the limit. I could go on, but the bottom line is that with 64 bit excel, and a modern mid-grade computer, we now have a stable tool that can be used by analysts to handle hundreds of millions of records. Yes….you read that right…hundreds of millions….essentially limited by your systems memory, and with a high dollar machine you can get to the billions.

So…how to test? It didn’t take long to put the idea to work inside the corporate accounting group…we had millions of records and we quickly found dozens of ways to utilize our new toy. However, I knew we were just scratching the surface…So I got a bigger computer for home and started playing around with a much bigger data set…the NOAA climate data. I will note that it was an incredible pain in the rear spanning several years in fits and starts, but I have recently finished compiling the daily climate data set. It has over 2.5 billion records covering just over 100,000 climate stations, some going all of the way back to the 1700’s. Can you cram 2.5B records into an excel pivot table? Actually yes you can….but I wouldn’t recommend it…it pretty much brings my high(ish) end I7 to it’s knees….fortunately, with this data set, we can carve it up into useful bits…which is what I hope to do on a regular basis going forward.

And so, I present to you the first of many Climate Data files….Global Average Temps. The process is quite simple. First, we take the daily max and min temperature for every station and calculate the average daily temperature. If either is missing for a given day, the day is dropped. From there, we calculate an average by year, along with a count of data points. This is critical because the majority of sites have at least some missing data here or there… Of the 30k+ sites with temperature data, only a few hundred have 50+ years “perfect” (No missed days) data. Now…let me make this perfectly clear…in no way shape or form do I ever alter or change anything from the original NOAA data set….even if it is blatantly wrong….and there are plenty that are. With 2.5B+ records….all I can do is take the official record….process it (calculating an average for example) and dump it into an excel pivot table/dashboard with some pretty charts. So…if something looks weird…my advice is to double click on that year in the table, which will give you the actual record. If that year has a count of 50 instead of something closer to 365…that’s your problem. In this file…I am pushing out everything. In later files I will isolate only stations/years with perfect, or close to perfect data sets…like maybe 350+ data points or we drop the year….stay tuned for that.

Here is your snapshot of the dashboard. There are 2 ways to find a climate station…the filter boxes in the upper left let you search for a specific station. You can also use the “slicers” to the right of the chart to select a country, then state(if US) then scroll through the station names to find one you may be interested in. To select multiple, you can hold down the [Ctrl] button to select or unselect multiples. Also remember you can clear your slicer selections by clicking on the icon in the top right corner of each….if you can’t find something it’s probably because you have a filter engaged. Selecting multiples will give you the average…just remember this isn’t a statistically sound method for calculating averages for too many reasons to list :). Just glancing at the New York Central Park Station selected in the screenshot above, we can see the variability in the year to year temperatures and the slope of both the 5 year and 20 year averages heading up.

And I know that got a little long….without further ado…your file:


It’s  only 17MB…yet contains hundreds of years of data for 30k+ global sites. Enjoy!! Yes…the download is going to be slow….it’s free after all.