The tidyverse style guide

All style guides are fundamentally opinionated. Some decisions genuinely do make code easier to use (especially matching indenting to programming structure), but many decisions are arbitrary. The most important thing about a style guide is that it provides consistency, making code easier to write because you need to make fewer decisions.”

-Hadley Wickham, “The tidyverse style guide.” style.tidyverse.org

Probably the definitive guide for writing R code. See also Hadley Wickham’s Advanced R.

Problems of Post Hoc Analysis

“Misuse of statistical testing often involves post hoc analyses of data already collected, making it seem as though statistically significant results provide evidence against the null hypothesis, when in fact they may have a high probability of being false positives…. A study from the late-1980s gives a striking example of how such post hoc analysis can be misleading. The International Study of Infarct Survival was a large-scale, international, randomized trial that examined the potential benefit of aspirin for patients who had had a heart attack. After data collection and analysis were complete, the publishing journal asked the researchers to do additional analysis to see if certain subgroups of patients benefited more or less from aspirin. Richard Peto, one of the researchers, refused to do so because of the risk of finding invalid but seemingly significant associations. In the end, Peto relented and performed the analysis, but with a twist: he also included a post hoc analysis that divided the patients into the twelve astrological signs, and found that Geminis and Libras did not benefit from aspirin, while Capricorns benefited the most (Peto, 2011). This obviously spurious relationship illustrates the dangers of analyzing data with hypotheses and subgroups that were not prespecified (p.97).”

—Mayo, quoting
National Academies of Science “Consensus Study” Reproducibility and Replicability in Science 2019 in “National Academies of Science: Please Correct Your Definitions of P-values.” Statsblogs. September 30, 2019.

The Plain Person’s Guide to Plain Text Social Sciences by Kieran Healy

The Plain Person’s Guide to Plain Text Social Science is written for graduate students in the social sciences, but useful for any writer. For people not doing sophisticated data analysis, the key suggestions are to use a text editor like Emacs for writing, Markdown for formatting, git—such as on GitLabs—for version control, and a translator program like Pandoc to translate your text file into a variety of formats, such as epub, pdf, doc and so forth. Additionally, he strongly recommends automated backing up of your data with a cloud service. He mentions two standards but if you go that route consider a privacy focused service like SpiderOak, or the free software alternative, NextCloud.

Details

The Plain Person’s Guide to Plain Text Social Science is worth reading for anyone involved with writing, research or data analysis. It introduces the problem of thinking about the tools that we use to do our work and serves as a technical primer for a particular style of writing.Kieran Healy starts with a dichotomy, c.f., Section 1.2. There are two computer revolutions. One revolution is trying to abstract out the technology and present people with an easy, touch interface to accomplish specific tasks. Using your phone to take a picture, send a text message, post to social media, play YouTube videos, etc. are all examples of this type of technology. It’s probably the dominant form of computing now.

The other revolution are the complex computing tools that are being developed that cannot be used via a touch interface. At this point, there is no way to use an open source neural net like Google’s TensorFlow in a way that is going to make sense to the vast majority of people.

As we move to using a keyboard, this tension can be seen in the different types of tools we can use to write, research and do analysis. Microsoft Word, PowerPoint, Excel, Access, etc. were designed to be digital equivalents to their analog predecessors – the typewriter, the overhead projector, the double entry account book or the index file. Of course, the digital equivalents offered additional capabilities, but it was still tied to the model of the business office. The goal for these tools, even as they include PivotTables and other features, is to be relatively easy to learn and use for the average person in an office.

The other computing revolution is bringing tools to the fore that are not tied to these old models of the business office and is combining them in interesting new ways. But, these tools have a difficult learning curve. For example, embedding programming code that can be written into a text analysis to generate calculations when it is typeset is not a feature the average person working in a typical office needs. But, it clearly has some advantages in some contexts, such as for data analysts.

Complexity makes mistakes easier to make. So, it requires a different way of working. We have to be careful to document the calculations we use, track versions from multiple sources, be able to fold changes back into a master document without introducing errors, and so forth. The Office model of handing a “master document” back and forth and the process bottle-necked waiting for individuals making revisions isn’t going to work past a certain minimum baseline level of complexity that we are slowly evolving past.

So, laying out this case, he then suggests various tools to consider: a text browser such as Emacs, Markup for formatting, git for version control, Pandoc for translating text documents into other formats, backup systems, a backup cloud service, etc. All of these tools are equally important to complex writing of any sort, whether it be for writing long works of fiction, research analysis, collaborative writing, and other circumstances we are more likely to find ourselves in, which these more powerful tools help make possible.