It has been said that there are three kinds of lies: lies, damned lies, and statistics. The quote is often attributed to Mark Twain, but in fact he was quoting British Prime Minister Benjamin Disraeli. “Figures often beguile me,” wrote Twain, “particularly when I have the arranging of them myself; in which case the remark attributed to Disraeli would often apply with justice and force.”
Twain did more than merely popularize Disraeli’s remark; he admitted that he himself was often “beguiled” by figures. Twain was nobody’s fool; for him to admit that even as keen an intellect as his own could be so beguiled, is a testament to statistics’ bewitching power over the mind.
What gives statistics such power to beguile? One reason is the simple fact that the results are couched in the language of numbers, and numbers have an exactness which other languages can rarely equal; already the listener/reader has the impression of a degree of precision which, for the quoted statistics, may be only an illusion. Another is that statistics are usually rooted in actual measurements, which have far more persuasive power than general impressions (which is as it should be). Furthermore, the statistics often represent the result of some numerical analysis, one which the reader is unable to reproduce, and which has an air of sophistication and insight by virtue of its mysterious complexity. Few lay readers are likely to argue with a statement like “The data are well approximated as a superimposition of a linear trend and ARMA(1,1) noise.” Few lay readers have any idea what ARMA(1,1) noise is, few really understand all the implications of the phrase “linear trend,” and for most, the real meaning of “well approximated” is a mystery.
There are many ways in which even an honest researcher can arrive at misleading statistics; this is a complex branch of mathematics, with many subtleties and lots of surprises. But in the vast majority of cases competent analysts arrive at valid results; misleading statistics are the result of deliberate effort. For example: “Four out of five dentists recommend Super-Duper-brand toothpaste!” What you’re not told is that the advertiser polled over 1,200 groups of five dentists before finding one with four of them recommending Super-Duper brand.
Even if only one group of five dentists is surveyed, and four of them recommend Super-Duper brand, you’re not told that with a sample of only 5 dentists, the uncertainty in our estimate of the “recommendation rate” is very high so high that we can’t be sure whether the rate is greater or less than 50%. Unless you survey every practicing dentist in the world, any estimate of the recommendation rate is only that: an estimate, and there’s uncertainty attached to it. In scientific publications, the statistical uncertainty associated with an estimate is almost always given without it, the results are considered far less credible. But in communication with the general public, the uncertainty is almost never given. Most readers wouldn’t know what to do with it anyway, and it’s considered unnecessary to make one’s point.
Another common method to generate misleading statistics, especially applicable to surveys, is to ask leading questions. Suppose a funding initiative is proposed for the local school system involving a new tax, and the initiative is placed on the ballot. One group might conduct a survey asking, “Do you support our children’s future by providing higher quality education through initiative 17?” Of course you do! Almost nobody opposes supporting our children’s future by providing higher quality education. But another group asks, “Do you support the creation of a new government bureaucracy funded by yet more taxes through initiative 17?” Of course not! Each group will get a “statistical” result which supports their own agenda; the questions are designed to do exactly that. Designing truly objective questions which don’t bias the result is an extremely difficult task which professional polling organizations have to expend tremendous effort to accomplish. The results of statistical analysis can’t be better than the data which feed it, and getting good quality data takes a lot of work and great care.
Of course statistics from advertisers and political pundits can’t be trusted. But even those who have the best intentions must be very careful to avoid letting personal preference influence results. Scientists are human beings, subject to all the thousand natural shocks that flesh is heir to, and it’s far too easy to skew the results even if you don’t want to because desire influences the outcome subconsciously. That’s why, ideally, experiments should be double blind: not only does the test subject not know which treatment is being administered, neither does the researcher. It’s not that researchers are assumed to be dishonest, it’s because human nature makes it nearly impossible prevent one’s preference from influencing the outcome.
Does this mean we should abandon statistics altogether? Is it just too prone to abuse, even with the best of intentions, to give reliable results? No! Results based on actual measurements really do give more reliable information than those based on impressions. And as complex as some statistical analysis methods are, they’re also immensely powerful, enabling us to characterize things with far more insight than would otherwise be possible. Abandoning statistical analysis would, frankly, cripple a great deal of scientific (and other) research. But the vulnerability to error is so great that efforts require great care and considerable expertise to be considered reliable.
If the task is so difficult for professionals, what can the lay reader do to guard against being misled? One is to realize that this is truly a difficult endeavor, so the level of expertise is a clue to the legitimacy of the result. Another is to be aware that if those reporting statistics have an agenda, especially a strong one, especially a political one, then personal bias is likely to invalidate the results. If you want to know who’s likely to carry your state in the upcoming election, pay no heed to “statistics” from the democrats or republicans; the Gallup or Harris polling agencies work very hard to avoid bias, but political parties work hard to legitimize bias.
One of the best indicators of reliability is reproduction of the same result from independent data and independent researchers. If one group announces that coffee is bad for your health, it’s a possibility. If two independent groups reach the same conclusion, it’s likely to be correct. If dozens, or hundreds, of independent research teams conclude that coffee is bad for your health, then you should stop drinking coffee. Let’s hope that never happens!
As a professional statistician, I’m well aware of the ease with which statistics can be manipulated to give a false impression. Statistics is like dynamite: in the right hands it’s a powerful tool that can move mountains for the good of humanity, but in the wrong hands its destructive power is frightening.