Social Explorer is an online data mapping application. The free tier only includes access to the Census 2000, according to their pricing chart. But, it looks like an interesting tool for analyzing data, and I wanted to bookmark for future reference.
Tag: data analysis
A View of Despair
A View of Despair is a really interesting visualization of suicide statistics in the Netherlands in 2017.
The Pandemic Cyclone
“This chart shows the daily number of new Covid-19 infections in each state over time. The y-axis is the population-normalized number of new infections per day; the x-axis is the rate of transmission (Rt). Each dot is a state or territory of the US, colored by region, and the area of the dot is proportional to the estimated number of new Covid-19 infections on that day.
https://observablehq.com/@chrisjkuch/covid-hotspots
I thought this was an interesting way to visualize the time series data.
Effective Data Visualization: Transform Information into Art
“In this course, [Data illustrator Sonja Kuijpers] gives you the tools you need to transform data into captivating illustrations using colors, shapes, and images. Discover how to collect and analyze data sets, as well as how to transform them into a unique poster that tells a story. Are you ready to create your own data art?
–Effective Data Visualization: Transform Information into Art
I never heard of Domestika, an online learning platform, before. This course seems awesome. Bookmarking for later.
The tidyverse style guide
All style guides are fundamentally opinionated. Some decisions genuinely do make code easier to use (especially matching indenting to programming structure), but many decisions are arbitrary. The most important thing about a style guide is that it provides consistency, making code easier to write because you need to make fewer decisions.”
-Hadley Wickham, “The tidyverse style guide.” style.tidyverse.org
Probably the definitive guide for writing R code. See also Hadley Wickham’s Advanced R.
Problems of Post Hoc Analysis
“Misuse of statistical testing often involves post hoc analyses of data already collected, making it seem as though statistically significant results provide evidence against the null hypothesis, when in fact they may have a high probability of being false positives…. A study from the late-1980s gives a striking example of how such post hoc analysis can be misleading. The International Study of Infarct Survival was a large-scale, international, randomized trial that examined the potential benefit of aspirin for patients who had had a heart attack. After data collection and analysis were complete, the publishing journal asked the researchers to do additional analysis to see if certain subgroups of patients benefited more or less from aspirin. Richard Peto, one of the researchers, refused to do so because of the risk of finding invalid but seemingly significant associations. In the end, Peto relented and performed the analysis, but with a twist: he also included a post hoc analysis that divided the patients into the twelve astrological signs, and found that Geminis and Libras did not benefit from aspirin, while Capricorns benefited the most (Peto, 2011). This obviously spurious relationship illustrates the dangers of analyzing data with hypotheses and subgroups that were not prespecified (p.97).”
—Mayo, quoting
National Academies of Science “Consensus Study” Reproducibility and Replicability in Science 2019 in “National Academies of Science: Please Correct Your Definitions of P-values.” Statsblogs. September 30, 2019.
On the Structure of Popular Films
“James E. Cutting, a Cornell University psychology professor, has compiled several datasets on the structure of popular films, including one that indicates the length of each shot in 220 movies from 1915 to 2015.
—James E. Cutting website
h/t Data is Plural
The Stages of Relationships, Distributed | FlowingData
Bubble animation showing the stages of a relationship, i.e., first met, romantic, living together, and married, over time with two halves showing the differences between the 1970s and 2010s. Key insight, living together is much more prevalent now.
The Plain Person’s Guide to Plain Text Social Sciences by Kieran Healy
Details
The other revolution are the complex computing tools that are being developed that cannot be used via a touch interface. At this point, there is no way to use an open source neural net like Google’s TensorFlow in a way that is going to make sense to the vast majority of people.
As we move to using a keyboard, this tension can be seen in the different types of tools we can use to write, research and do analysis. Microsoft Word, PowerPoint, Excel, Access, etc. were designed to be digital equivalents to their analog predecessors – the typewriter, the overhead projector, the double entry account book or the index file. Of course, the digital equivalents offered additional capabilities, but it was still tied to the model of the business office. The goal for these tools, even as they include PivotTables and other features, is to be relatively easy to learn and use for the average person in an office.
The other computing revolution is bringing tools to the fore that are not tied to these old models of the business office and is combining them in interesting new ways. But, these tools have a difficult learning curve. For example, embedding programming code that can be written into a text analysis to generate calculations when it is typeset is not a feature the average person working in a typical office needs. But, it clearly has some advantages in some contexts, such as for data analysts.
Complexity makes mistakes easier to make. So, it requires a different way of working. We have to be careful to document the calculations we use, track versions from multiple sources, be able to fold changes back into a master document without introducing errors, and so forth. The Office model of handing a “master document” back and forth and the process bottle-necked waiting for individuals making revisions isn’t going to work past a certain minimum baseline level of complexity that we are slowly evolving past.
So, laying out this case, he then suggests various tools to consider: a text browser such as Emacs, Markup for formatting, git for version control, Pandoc for translating text documents into other formats, backup systems, a backup cloud service, etc. All of these tools are equally important to complex writing of any sort, whether it be for writing long works of fiction, research analysis, collaborative writing, and other circumstances we are more likely to find ourselves in, which these more powerful tools help make possible.