Members of the public often see shocking statistics in advertising or the media, while scientists also deal with statistics as part of their research. However, statistics cannot always be taken at face value, and one useful tool for interpreting statistics is an understanding of the difference between correlation and causation.
To investigate a phenomenon, scientists design experiments using experimental and control groups. When analyzing the relationships between sets of research data, researchers needs to bear in mind that correlation is merely an observed relationship between events or data sets, whereas causation is a proven relationship in which one factor directly causes another. In statistical terms: Correlation does not imply causation.
For example, someone may observe that people who buy diet soda are usually fat. In other words, there is some kind of relationship between generously built people and low-calorie drinks.
A researcher who is interested in the habits of fat people might conduct a survey and record statistics concerning how much people weigh and how much diet soda they consume in a given week. Statistical analysis might then reveal that the heavier people are, the more diet soda they drink, and the researcher will be able to state:
There is a direct correlation between body weight and the consumption of diet soda.
Now imagine that this statement is published in the press, Many people become convinced that drinking diet soda causes weight gain, and sales of diet soda plummet. However, those people would be acting on a misunderstanding because this statement does not actually explain why the purchasers of diet soda are so fat. It merely points out a correlation, or relationship between the two factors.
Causation would actually mean that drinking diet soda has been proven to cause weight gain. This may sound ridiculous, but on Oprah.com there is an article by Dr. David Katz which argues that it does exactly that. On the other hand, it could equally well be a case of reverse causation: people drink diet soda because they are fat, and the fatter they are the more they drink.
But the relationship may be much more complex than that, and involve other factors such as, just to give a few examples, how much people exercise, their income level, how much advertising they are exposed to and what other beverage options are available.
One possibility is that another unknown factor is responsible for both weight gain and soda drinking. We could argue that fast food is high in salt and consequently causes people who eat it to become thirsty. In spite of drinking diet soda with their burger and fries, the more fast food they eat the fatter they become while at the same time drinking more diet soda.
There may also be a combination of causality in the three relationships. We could for example postulate that people drink diet soda because they are obese, and that the obesity is caused by fast food. Or there may be a self-reinforcing system in which obesity causes diet soda drinking, while at the same time diet soda drinking causes obesity. People drink diet soda because they are obese, but the diet soda interferes with their metabolism, causing them to be constantly hungry, so they eat more, become fatter and are driven to drink more diet soda in a desperate attempt to lose weight.
On the other hand, perhaps there is no relationship at all, and the correlation is merely a coincidence. Although fat people buying diet soda tend to stand out, careful observation may also reveal a lot of less noticeable slim people who are also buying diet soda. The researcher may also observe that many people put on weight even though they do not indulge in diet soda drinking at all.
In addition, researcher bias may lead people to come to a conclusion which they expect or want to reach. A researcher who believes that diet soda is poison could seize on the fact that fat people drink diet soda to support this opinion.
The same criteria apply to indirect or negative correlation, in which an increase in one factor relates to a decrease in another. For example, graphing the relationship between body weight and exercise may reveal an indirect correlation between exercise and body weight: as people exercise more, the less they weigh. Again, this merely demonstrates a relationship, and does not identify the reason. Causation needs to be established through further investigation. It could be argued that exercise causes weight loss, but it could also be argued that slimmer people exercise more because they find it easier and more enjoyable, or that both phenomena are caused by a third factor such as education or income level. Perhaps it is merely a coincidence, or even a perception resulting from researcher bias.
Causation may not always be obvious. For example, the results of a British survey which reported that smoking parents were more likely to have delinquent teenagers could have drawn an erroneous conclusion which could have been:
(a) an example of reverse causation: The parents smoke because of the stress involved in parenting delinquent children;
(b) the result of a third factor, poverty: Lower classes are both more likely both to smoke and to have delinquent children; or
(c) a conclusion drawn by a biased researcher who was an adamant anti-smoker.
Advertising and breaking news stories often feature attention-grabbing statements concerning the results of “recent studies.” However, it is important to maintain a degree of healthy skepticism. One means of doing this is to look at the statements critically, bearing in mind the difference between correlation and causation.