Not statistically significant…

Most people have no idea what “Not statistically significant” means and I don’t see the media being too eager to fix this.

Say you read the following piece in a newspaper:

A study done at the University of Washington showed that, after controlling for race and socioeconomic class, there was no statistically significant difference in athletic performance between those who stretched for 5 minutes before running and those who did no stretching at all.

What do you conclude from that? Stretching is useless? WRONG.

Here’s what the hypothetical study actually was: I picked four random guys on campus and asked two of them to stretch and two of them not to. The ones who stretched ran 10% faster.

Why is this then not statistically significant? Because the sample size was too small to infer anything useful and the study was designed poorly.

All “not statistically significant” tells you is that you can’t infer anything from the study but word the study carefully enough and you can have people believe the opposite is true.

Have you ever heard the claim “There’s no statistically significant difference between going to an elite Ivy League school and an equally good state school?” Perhaps from here, here or even here?

Well, from this paper (via a comment in an Overcoming Bias post):

For instance, Dale and Krueger (1999) attempted to estimate the return to attending specific colleges in the College and Beyond data. They assigned individual students to a “cell” based on the colleges to which they are admitted. Within a cell, they compared those who attend a more selective college (the treatment group) to those who attended a less selective college (the control group). If this procedure had gone as planned, all students within a cell would have had the same menu of colleges and would have been arguably equal in aptitude. The procedure did not work in practice because the number of students who reported more than one college in their menu was very small. Moreover, among the students who reported more than one college, there was a very strong tendency to report the college they attended plus one less selective college. Thus, there was almost no variation within cells if the cells were based on actual colleges. Dale and Krueger were forced to merge colleges into crude “group colleges” to form the cells. However, the crude cells made it implausible that all students within a cell were equal in aptitude, and this implausibility eliminated the usefulness of their procedure. Because the procedure works best when students have large menus and most student do not have such menus, the procedure essentially throws away much of the data. A procedure is not good if it throws away much of the data and still does not deliver “treatment” and “control” groups that are plausibly equal in aptitude. Put another way, it is not useful to discard good variation in data without a more than commensurate reduction in the problematic variation in the data. In the end, Dale and Krueger predictably generate statistically insignificant results, which have been unfortunately misinterpreted by commentators who do not sufficient econometric knowledge to understand the study’s methods.

In other words, the study says no such thing, it simply says the study itself was not sufficient to prove that Ivy League educations made you more money because the data wasn’t good enough and yet the media has twisted this into a positive assertion that state schools do indeed make you as much money as Ivy Leagues.

I’m generously inclined to believe that most cases that I see of this error are caused by incompetence but it’s pretty trivial to see how this could be used for malice. Want the public to believe that Internet usage doesn’t cause social maladjustment? Just design a shitty study and claim “We found no statistical difference in social competence between heavy internet users, light internet users and non users”. Bam, half the PR work has already been don for you.

Controlling for…

Here’s another statistical gem I see all the time:

An analysis done at the University of Washington showed that there was zero correlation between race and financial attainment after controlling for IQ, education levels, socioeconomic status and gender.

Heartwarming right, it means if we put blacks and whites in the same situation, they should earn the same amount of money. WRONG.

The key here is to see that we’re looking for financial attainment and controlling for socioeconomic status. Those two things mean the same damn thing. Basically, all this study told us was that being rich causes you to be rich.

Most people view the “controlling for” section of statistical reporting as a sort of benign safeguard. Controlling for things is like… due diligence right, the more the better… It’s easy to numb people into a hypnotic lull with a list of all the things you control for.

But controlling for factors means you get to hide the true cause for things under benign labels. That’s why I’m always so wary of studies that control for socioeconomic status or education levels, especially when they don’t have to. Sure, socioeconomic status might cause obesity but what causes socioeconomic status.


When people do bother to talk about statistical manipulation, they usually focus on issues of statistical fact: Aggressive pruning of outliers, shotgun hypothesis testing and overly loose regressions. But why bother with having to sneak poorly designed studies past peer review when you can just publish a factually accurate study which implies a conclusion completely at odds with the data? That way, you sneak past the defenses of anyone who actually does know something about statistics.

Sometimes, I swear, the more statistically savvy a person thinks they are, the easier they are to manipulate. Give me a person who mindlessly parrots “Correlation does not imply causation” and I can make him believe any damn thing I want.

  • Pingback: Another way to lie with statistics « Bumblebee Labs Blog

  • web hosting

    Thanks so very much for taking your time to create this very useful and informative site. I have learned a lot from your site. Thanks guys

  • Frederic Christie

    Actually, SES is very different and can also be defined intergenerationally. Similarly, financial attainment means more than it sounds. You have to see how the study operationally defined it to conclude what you did.