Two Computing Revolutions, Exhibit R: Manipulating & Visualizing MASIE Sea Ice Data

Back in April 2018, I mentioned the idea of two computing revolutions:

“There are two computer revolutions. One revolution is trying to abstract out the technology and present people with an easy, touch interface to accomplish specific tasks. Using your phone to take a picture, send a text message, post to social media, play YouTube videos, etc. are all examples of this type of technology. It’s probably the dominate form of computing now.

The other revolution are the complex computing tools that are being developed that cannot be used via a touch interface.”

The programming language R is an example of the second type of revolution. One simple task it can perform is reformatting data. It can take a long column of numbers from a source such as the MASIE sea ice data that is essentially unintelligible to people, and it can change it into a form that makes for easy comparison across years by day for a single sea.

Original data
Reformatted data facilitating comparison of a single day across years.

While this data manipulation is a powerful tool, this is only the tip of the iceberg. The real power comes from the kind of computing and visual representation that comes after the data has be reorganized. For example, once we manipulate the data into the above format, we can then create a correlation matrix that shows which years are closest in any given year.

And this chart is generated by five lines of code, two of which create a pdf:

sea_rcorr <- rcorr(as.matrix(sea_table[, -c(1)]))
sea_coeff <- sea_rcorr$r
pdf(paste0("./output/sea-ice-in-", sea, "-correlation-", sea_date, ".pdf"))
corrplot(sea_coeff, method="pie", type="lower")

We can then build on this step and take the correlations between the five years closest to the current year and create a chart looking at a specific period of days, like so:

Looking at this chart, we can make a pretty accurate guess as to what the extent of sea ice will be on Day 100 because we can see that the years 2014, 2015 and 2016 are the closest years for comparison and we get easily see by how much.

R is a really powerful tool, particularly if you have to do repeated calculations on data that frequently updates and you need to present it in a format that helps decision making. It is far superior than anything I’ve ever done using spreadsheets. But, it does take time to set-up initially, and it is difficult for individuals to develop the expertise to do it effectively if they are not part of a team, like the one described in the article I posted earlier today: How the BBC Visual and Data Journalism Team Works With Graphics in R. Still, this is a much better tool for certain kinds of problems that is worth looking into if you find yourself looking at complex data to make decisions.

Try It Yourself

Download RStudio for the operating system you use. Take the R script I used to manipulate the data and generate the charts above. Unfortunately, WordPress does not take text uploads, so the linked script is in OpenDocument Text (ODT) format. You should be able to cut and paste it over into the Source pane in RStudio and fix the formatting so it’ll run. You’ll also need to install the relevant libraries over in the Packages pane on the lower right side. Then, since the script is written as a function, you’ll also need to call the function in the Console pane once you have loaded it with something like:

> sea_ice(sea="Greenland_Sea", day_one=1, last_day=365)

Just the relatively easy exercise of getting this to work could serve as a starting place to get a sense of how R works and how you might incorporate it into your workflow. It’s worth giving a try.

Note 1: You will need to replacing Greenland Sea with the area of interest to you, i.e., “Northern_Hemisphere”, “Beaufort_Sea”, “Chukchi_Sea”, “East_Siberian_Sea”, “Laptev_Sea”, “Kara_Sea”, “Barents_Sea”, “Baffin_Bay_Gulf_St._Lawrence”, “Canadian_Archipelago”, “Hudson_Bay”, “Central_Arctic”, “Bering_Sea”, “Baltic_Sea”, “Sea_of_Okhotsk”, “Yellow_Sea”, or “Cook_Inlet”.

Note 2: This was my first serious attempt to write anything useful in R. I have some minor experience writing in Perl, Python, and a few other computer programming languages that helped make this easier to do. Still, it is worth noting I’m not a programmer. Writing programs in R is a skill that can be learned by many people and be useful to some degree to anyone.

How the BBC Visual and Data Journalism Team Works With Graphics in R

“Over the past year, data journalists on the BBC Visual and Data Journalism team have fundamentally changed how they produce graphics for publication on the BBC News website. In this post, we explain how and why we have used R’s ggplot2 package to create production-ready charts, document our process and code and share what we learned along the way.”

—BBC Visual and Data Journalism, “How the BBC Visual and Data Journalism team works with graphics in R.” February 1, 2019.

I’ve been learning a bit of R and working with packages like ggplot2. I thought this gives a nice demonstration of why someone might like to learn to use it, its capabilities, and the article provides some useful references.