Berkson’s Paradox

“My go to example of this (I don’t actually know if it’s true, but it’s a nice illustration) is that professional tennis players tend to be either tall or fast – tall tennis players tend to be slower, fast tennis players tend to be shorter.

You could come up with some complicated biological explanation about why these two traits might be negatively correlated, but it would be wrong, because they’re not negatively correlated in the general population, or at least not to the same degree. The reason is much simpler than that: Short, slow, people will rarely play tennis professionally.

As a result, if height and speed are entirely independent of each other (even if they’re slightly positively correlated!) when you look at professional tennis players they will become negatively correlated, because it’s more likely to be one or the other than it is to be both.

This may seem like a weird niche edge case, but once you start noticing it it’s everywhere.

-David R. MacIver, “Berkson’s paradox is everywhere.” DRMacIver’s Notebook. March 15, 2020.

The Wikipedia examples aren’t really accessible. But, I think the basic idea is that if you select according to two criteria, then when you look at the resulting set, you might see relationships between those two qualities that don’t exist outside that selected set. It’s one example of why correlation doesn’t equal causation. The cause was the selection criteria and not any relationship between the two qualities.