Scientists do not release their raw data in their published studies. Instead, they describe their methods of data collection and analysis, and show the final results of that analysis. Disclosing the methods and evidence is extremely important for assessing the validity of scientific research.
Why is disclosure important?
The methods of data collection can affect the type of data which was collected. Too much flexibility in data collection can invalidate a result.
The most famous example of this are the polls which preceded the 1948 U.S. presidential election. Opinion polls can be very accurate in identifying voter opinion shifts. However, every poll from before the election confidently declared every time that Dewey would defeat Truman. In this case, sampling errors were introduced which missed the strong swing towards Truman. Disclosure of the polls’ methods could have identified problems in data collection.
The end result was complicated even further by Dewey’s continued lackluster campaigning, a tactic which was specifically chosen because of the continued positive poll results. This is an example of how polls can alter the system they are examining.
Every scientific study also alters the system it studies, which may invalidate its findings. This is a constant struggle in anthropology and sociology, and is sometimes addressed through deliberate subject deception in psychological and health sciences studies. A study which employs deception must submit its proposed methods to its local university ethics board, which decides whether the deception is acceptable in itself, as well as whether the possible findings justify the deception. The most common example of subject deception is the double blind test against a control placebo.
The number of null results or non-responses is just as important as the data analysis of the responses. However, journals don’t usually print studies which have failed to support their hypothesis. This can result in a publication bias which makes published results seem stronger than they would be if the entire body of evidence was considered. Many countries are considering or have enacted legislation to address this point.
Companies also can’t make money off a null result. Pharmaceutical companies conduct thousands of studies each year, but usually release only those few studies which show a favorable result, regardless of how many studies showed a null result.
Polls are particularly notorious for non-disclosure of null results and invalid populations. Reputable polling companies will be careful to indicate the percentages of non-respondents, but there are many pollsters who are looking for particular results rather than accurate results. One poll presented its findings as 1 out of every 5 people agreeing with the question, in a sample of 1,508 completed responses. The poll did not mention that 41,033 people had been asked the question, and had made no attempt to ensure that the people who answered matched the reported subject pool. Nearly every Internet poll has the same built-in problem, magnified.
The same issue also introduces a greater tendency towards false positives among researchers. The current standards of research disclosure can increase the percentage of false positives beyond the maximum accepted rate of 5%. This results from the built-in ambiguity in all statistical analysis, as well as from researchers’ desire to find a positive result.
Disclosure-based requirements for scientific authors
Joseph Simmons, Leif Nelson, and Uri Simonsohn have identified a recommended list of requirements for scientific authors which eliminates the problem of overly flexible datasets.
1. Authors must decide how to terminate data collection before starting data collection, and disclose this rule in their article.
2. Authors must make at least 20 observations per statistical cell, unless there is a compelling cost-of-data-collection justification.
3. Authors must list all variables collected in a study.
4. Authors must report all experimental conditions, including null results.
5. Authors must include null results into their statistical analysis.
6. If there is a covariate, authors must show their statistical results with and without the covariant.
These recommendations would improve the average quality of data. They would also decrease the total number of publications.