The basic statistical methods are used to establish the soundness of a basic cause and effect hypothesis. There are tests for internal and external validity of data in supporting the hypothesis, whether a sample population is representative of a whole, and the strength of the cause and effect relationship between variables. Effect statistics and means tests are used to evaluate the data itself. Correlation coefficients and standard deviations test the strength of the relationships.
Internal and external validity tests are used to determine if the cause and effect hypothesis is supported or if the “null hypothesis” is supported. The null hypothesis is an alternative explanation for the cause and effect relationship, and is tested at the same time as the favored hypothesis. If the favored hypothesis is supported, then internal validity is established to some degree. In external validity, the existence of unidentified causative variables is detected. P-values, the Students T, and Effect statistics are used.
In probability statistics, there are tests of how useful data is in forecasting future outcomes based on current observations. Probability , or frequency distributions, p-values, and confidence limit tests are used to determine validity.
In forecasting probability based on sample populations, there are tests to determine how well the sample population represents the whole population. There are tests for the amount of randomization, which is the holy grail of experimental design, and which is rarely achieved. There are tests for the amount of “fit” or “deviance” that an individual observation has from the population norm.
In any test, the experiment design in the critical factor. Defining the variables, arriving at a clean hypothesis that clearly describes the expected cause and effect relationship, and clearly describing alternative or null hypotheses is as important as gathering that data and crunching those numbers. Understanding how facts relate to each other: Is there a one to one, one to many, recursive, categorical, arbitrary cutoff issue with the data? Is there some other issue with assigning quantitative values to qualitative variables?
With control and test groups with pre and post testing, are the individual independent and dependent variables, as well as null hypotheses well identified? If there is not randomization, what issues are expected to arise in generalizing results to the larger population?
With programs, are the components of the program well identified and can they be quantified and tested, or should they be implemented and tested independently?
In non experiments where surveys or participant observation is done, the efforts to “test” for the existence of incomplete questions, bias in questions, and other weaknesses in the methodology must also be rigorous. Multiple examinations by objective peers or mentors can reveal problematic issues with non experimental processes and procedures. Review of preexisting surveys and materials must be rigorous.
As a result, before we can worry about statistical computations, we need good experimental design.