I shed an invisible tear whenever I hear “correlation does not imply causation” which the otherwise excellent swivel (a website about correlations) emphasizes. Of course, there’s truth to it. It saddens me because:
It’s dismissive. It is often used to dismiss data from which something can be learned. The life-saving notion that smoking causes lung cancer was almost entirely built on correlations. For too long, these correlations were dismissed.
It’s misleading. In real life, nothing unfailingly implies causation. In my experience, every data set has more than one interpretation. To “imply” causation requires diverse approaches and correlations are often among them.
It’s a missed opportunity — namely, an opportunity to make a more nuanced statement about what we can learn from the data.
It’s dogmatic (see “Jane Jacobs on Scientific Method”). Some correlations, such as those from “natural experiments,” imply causation much more than others. I suspect it does more harm than good to lump all of them together.
This entry was posted
on Saturday, February 10th, 2007 at 11:42 pm and is filed under scientific method.
You can follow any responses to this entry through the RSS 2.0 feed.
You can skip to the end and leave a response. Pinging is currently not allowed.
February 11th, 2007 at 10:13 am
Well said.
In the end, all we really have is correlation — all scientific models are based on correlation of model predictions with observations. If our observations don’t match model predictions, then the model is abandoned or changed.
And you are exactly right that correlation is a good starting point for looking for cause/effect models.
So I suppose the phrase should really be “correlation does not imply that there is a cause-effect model that will stand up to other experiments or analysis”.
February 11th, 2007 at 1:03 pm
Yes, that would be an improvement. My candidate would be something like “Correlations are generally — not always, but usually — less persuasive evidence for causality than experiments.” To make explicit the implied comparison.
February 11th, 2007 at 11:50 pm
Well, if you’re asking for a catchy replacement phrase, I think that “correlation suggests causation” isn’t bad. It can be used to reply to people saying “correlation doesn’t imply causation”. You can say “but it *does* suggest it”.
February 12th, 2007 at 1:20 am
The swivel people, on their blog, link to this description of correlation on Good Math, Bad Math, which seems to get it about right.
So if they agree with the post they approvingly linked to, then it sounds like they understand the issues. And given that their site seems to be mainly concerned with public data gathered from uncontrolled circumstances, in most of the use-cases of their site, much caution *is* needed when inferring causality, so their approach to it might be justified by their (presumably statistically naive) audience.
February 12th, 2007 at 5:09 am
The statement “correlation does strongly suggest a causal relationship” strikes me as too strong. It’s too easy to think of exceptions.
The Good Math, Bad Math essay you link to goes on to say:
This does not describe how scientists in my field (experimental psychology) act. Usual behavior is to infer X causes Y if every time you (intentionally) change X, Y changes. The author is a computer scientist; that explains the lack of familiarity.
February 12th, 2007 at 6:51 am
Yeah, that point about inference based on manipulating random variables was brought up in the comments.
But do you really think that statement goes too far? I mean, a case could be made that correlation virtually always implies a causal relationship, just not necessarily a direct one. If X and Y are correlated, and both caused by Z, they have a causal relationship with each other–and indirect one via Z. So saying that “correlation does strongly suggest a causal relationship” seems to me just to mean that when you see correlation, you should definitely look for the causal relationship, but it doesn’t tell you to assume that it’s a direct relationship. (Or if it is, which direction it’s in.)
February 12th, 2007 at 7:29 am
I am not familiar with the concept of an “indirect” causal relationship. If I were, I might agree with you. To me, “causal relationship” means what you call “direct causal relationship.”
February 12th, 2007 at 5:02 pm
Seth, great post, you are spot on. We basically whimped out after we launched Swivel after all the barrage of protests from academics and data vigilantes about our correlation meters. Even the Wikipedia article about correlations says something along the lines of, “correlation doesn’t always mean causation but it sure is a hint.” And when some say, well, maybe two trends are correlated because there’s something else that’s causing both to change, that’s great: it made you think about what that third thing could be. So, I took out the link from our code just now, and it will disappear after we refresh our site. We’re not some kind of anointed data gatekeepers trying to protect people from the dangers of correlation. People, look at the data and make up your own mind. That’s what Swivel is about.
February 12th, 2007 at 5:43 pm
In the end, all we really have is correlation
Tim, this isn’t entirely fair to say. In many fields (and in mine in particular) all we choose to look at are correlations. However, we also have temporal precedence. If one thing doesn’t precede the other thing, it is hard to argue for a causal relationship no matter how nicely the correlational analysis works out.
a case could be made that correlation virtually always implies a causal relationship
If X and Y are correlated, it suggests at least two models, (1) X -> Y and (2) Y -> X, one of which is probably wrong. This is before considering any common causes of both.
February 12th, 2007 at 7:12 pm
Thanks, Dmitry, I really appreciate that comment. You should know.
That you got a “barrage of protests from academics and data vigilantes” is interesting. There’s something backwards about that. Surely the point of data collection and analysis is to learn something. Just like you say, it makes a lot more sense to emphasize what you do learn rather than what you don’t.
February 13th, 2007 at 2:06 am
An indirect relationship is where X -> A -> B -> Y, or A -> B -> X and A -> C -> Y. Of course, there’s a limit to how indirect it can be before the limited correlation between each of the steps multiplies together making the effect of A, B, and C on X and Y less negligible. If that path doesn’t explain all the correlation, you look for additional mechanisms.
February 14th, 2007 at 2:43 pm
I happened to see this quote from Jensen just now:
Of course, a statistical correlation between two variables doesn’t imply direct causality of the one variable by the other, nor does it imply the absence of causation. A lack of correlation, however, is more apt to imply the absence of causation. Investigation can’t afford to eschew correlations as clues to causation, which of course must be established by other kinds of analysis. At an exploratory stage of investigation, reciting the “correlation is not causation” mantra is a premature criticism.
http://psycprints.ecs.soton.ac.uk/archive/00000019/
Seth, Jensen says that inspection time is highly g-loaded, looks like another possible metric for measuring mental acuity re omega-3 intake…
February 14th, 2007 at 3:59 pm
Re the Jensen quote: I was once interviewing a job candidate in my office. She said something like “but of course correlation does not imply causality.” I said, “wait a minute. it’s not that simple. doesn’t the absence of correlation suggest the absence of causality?” She thought about it, and agreed with me.
Re inspection time: That wouldn’t be a good measure. To determine inspection time you show stimuli very briefly (e.g., 200 msec) many times. If a whole trial takes 4 seconds, then only a small fraction of it (200 msec/4000 msec) involves brain work. That’s the opposite of what I want.
February 13th, 2008 at 8:31 pm
It seems to me that the statement “correlation does not imply causation” is not meant to be dismissive (in general), but to note for a public unfamiliar with the subject that just because one factor tends to follow another, you should not immediately assume one to be the cause of the other.
Imagine if every time someone noticed two factors that tend to follow each other, they began to cry ‘causation!’ You might have people believing that just because the numerals of a clock climb as the day becomes lighter, that the clock causes the change of the day. If observing the clock and the day outside were all the consideration they ever gave it, the clock might seem to control the day. However, you need merely to reset the clock to see that this action has no effect on the day outside, and the theory is broken. The further investigation that this quote begs would disprove the idea of causation between these two, and the correlation as well, for that matter. Further investigation would reveal that it was the passage of time itself which lead the clock to move (when properly set
and the days to turn.
Returning to the point, this phrase doesn’t mean (in my opinion) that two factors are not related, but merely that correlation alone isn’t enough to reach a satisfactory conclusion about the matter. Other evidence is needed before a conclusion should be made. When used in the wrong way, I can see how it could turn into undue criticism, but, I also think it serves a good point in asking people to think twice about a given example, rather than take what could merely be coincidence as fact.
February 14th, 2008 at 12:18 am
I don’t mind educating the public. I mind unbalanced views of evidence. To say what’s bad about evidence without also saying what’s good is misleading. Which is bad education.