It is not unheard of to have perfect 1 to 1 correlation of ice cream sales in 1957 with NFL football outcomes in 2010, however is the data valid? Since the science of statistics has so much influence on practitioners and non-practitioners alike it would be prudent to discuss the idiosyncrasies and common mistakes to better understand how it can be manipulated.
Academic statistics involves the digestion of formulae and theorem, the manipulation of data and the understanding of whether or not that data is valid. Considerable effort is spent at reputable universities discussing error analysis, biased data and statistical significance; however the ultimate test of statistical fortitude comes from field application.
We’ve all heard or read the phrase “In God we trust, all others bring data!” The irony being a junior analyst can convince upper management to sign off on a significant expense based on the mystery of statistics. True statistical analysis requires true data, with false data comes false results. So where does statistics become manipulated?
With regards to statistical significance; this is a fundamental first year topic which is understood to be important in industry. For instance; a machine shop may wish to implement a change in guarding structure on their twenty or so machining centers in an attempt to reduce clean up time, ultimately to be a cost savings to the company. This is a realistic trend in industry and a great deal of wealth has been distributed amongst consultants and trainers in the name of Six Sigma and other data driven decision making strategies.
Ideally an investigator would trial a mock up on one center, measure the results, interpret the data for statistically significant difference, in this case a reduction in cleanup time and make a decision to expend additional resources to implement this modification on the other equipment. The theory is sound but in practice things get tricky.
For instance; if the investigator only works day shift and only measures day shift ignoring with the assumption of equality the other two shifts there is the possibility of labor bias being introduced into the data. There is the common issue of labor behavior changing when observed; there are also other variables that we could not possibly cover in the scope of this article. The point being, this investigator will perform a calculation to show statistical significance, probably using a 95% confidence interval and get results showing improvement resulting in a decision to implement this change. Whether or not there is truly an improvement would not be evident until after the change.
There is no accusation that the investigator is manipulating statistics, but rather the situation of the data in a non-personified way manipulating the decision of the investigator. This is a common mistake currently in North American manufacturing and more specifically the automotive industry.
For a simpler example let us look at the descriptive statistics we encounter everyday in the news and media. The great debate of ‘Global Warming’ has come to a point where the opposition has been vindicated after 20 years of being the “bad guy” for saying that global warming doesn’t exist in the way the media has portrayed the topic.
Several reputable researchers have put their endorsement on data based on realistic models supporting the claim for global warming. These same researchers had previously denied the existence of such a phenomenon and subsequently changed their opinion citing new data and better statistics. The purpose of this discussion is not to debate global warming or whether or not it is real but rather to highlight the great power of data and endorsement on public perception.
Data in the case of global warming is based on assumptions, trend analysis and the researcher’s favorite tool in their arsenal, the multiplier better known as “the fudge factor”. Multipliers are a great way to create statistical significance or to bring two trends closer to each other for better correlation. The argument for these multipliers usually involves some other supporting data or theory which adds validity to usage of such a value which in some cases are simple coefficients but in others a polynomial equation which allow a curve to be manipulated on multiple range points.
It is not difficult to see how this can get out of hand. Most people trust experts without questioning their sources, why would you, they are the experts. The conclusion to this example is that recently several other reputable scientists and researchers have shown correlation without multipliers to another model, one based on solar flare patterns and longer trend history to show that we are not experiencing global warming. Since this development there has been no rebuttal to the opposition.
There is no accusation of malice or skull duggery for global warming example but based on the errors discussed in that example it is not unimaginable to believe that other groups, organizations and collaborations could not use this methodology to intentionally persuade others to follow or support their cause.
In fields where peer review is based on credentials at minor level and data is not required to be statistically scrutinized, confidence intervals can be manipulated until the desired result is obtained. For instance; in local media if a report states that “there is a statistical chance that you can die from eating avocadoes” based on a study of moldy avocadoes and a confidence interval of less than 80%, only the initial phrase needs to be stated in a news broadcast to illicit fear and panic in the avocado eating community. Was there a control group? How many would you need to eat? Was there an anti avocado lobby group funding the research? Undoubtedly these questions would not be presented.
The advice to any information seekers out there, look at the data yourself if possible, look for research that is peer reviewed without opposition. Review the background science or theory. Make a decision based on those principles and if your decision is erroneous at least it will be an error with due diligence as opposed to blind acceptance.