Statistics

Alabama is home to a traditional college football rivalry between the University of Alabama and Auburn University. On the steps of a building at the University of Alabama, a woman conducts a poll. “Are you an ‘Bama fan or an Auburn fan?” she asks, her pen poised over her clipboard. What kind of statistical results might be reported by a poll conducted in this manner? The interviewer could report that nineteen out of twenty football fans in the state of Alabama are ‘Bama fans, and she could report this conclusion without falsifying information. This example of data collection demonstrates how statistics can be manipulated.

Statistics is the study of a sample set, which is assumed to represent an entire population, and which is used to make inferences about the whole population. In the example above, the interviewer assumed that the individuals she encountered on the steps of that building represented the entire population of Alabama.

Consider a statistic on WebMD.com that claims, ” At least one out of every 5 Americans suffers from allergies. ” How does WebMD.com know this? Did WebMD.com knock on your door and ask you if you suffer from allergies? WebMD.com could not possibly have allergy data for every single American, so they must estimate the whole based on sampling, and statistical analysis is only as good as the integrity of the sample group source. What if all the patients interviewed live in Georgia, where pollen counts are high? Would one in five individuals in Wisconsin, where pollen counts are lower, suffer from allergies?

Statistical analysis is also only as good as the sample size. What if five people on the street were interviewed, asking if they suffered allergies, and three of five confirmed that they did? Would it be accurate to say that three in five people suffer allergies? Absolutely, it would. So the question becomes: which people?

WebMD.com is a reputable source of medical information and has no motivation for manipulating statistics. Commercial organizations seeking to sell something might be another story. Consider a claim like, “preferred two-to-one over the competitor.” Preferred by whom? If Coca-Cola is preferred two-to-one to Pepsi in Atlanta, the home of the Coca-Cola Company, is that representative of the whole American population? What about the world population? But claims like these are made in advertising all the time. Do they sway you?

Statistics are fallible, because they rely on a sample set to represent an entire population. Because of this, statistics can be easily manipulated. Any time a statistic is reported, the method for collecting and analyzing the data comes into question. So the next time you hear that the majority of Alabama residents prefer Alabama football to Auburn, be sure to ask which Alabama residents the study means.