Statistics is the science of collecting data through the use of surveys and experiments and interpreting that data with mathematical formulas. Nearly every college student takes at least one class of statistics as it is used in the social sciences, natural sciences, business, government, and mathematical fields. The following words and their descriptions is not a comprehensive list, but offers the most common terms that are used in general statistics.
Population – a collection of items that is of interest. For interest, if a business owner is interested in hot dog sales, the population would be how many hot dogs are sold.
Random Sample – a set of items that have been drawn from a population to collect data so that inferences can be made about the entire population. Instead of asking every person in New York City what brand of hot dog he or she prefers, a businessman could take a random sample of 200 people and make an inference that Brand X is the most popular for the entire population.
Mean – the mathematical average of scores of a population that measures central tendency. For example, if a college student earns an A (4 points), A (4 points), B (3 points), B (3 points), and a C (2 points) then his GPA would equal 4 + 4 + 3 + 3 + 2 = 16. The average would be found by dividing the total by how many grades there are, which is 5. The mean, which in this case is the GPA, would equal 3.2.
Median – the point of a population that cuts the distribution of scores in half which measures central tendency. Using the above example of 2,3,3,4,4 the median would be 3.
Mode – the number that occurs more frequently which measures central tendency. Using the above example of 2,3,3,4,4; the mode would be 3 and 4 and would be considered multi-modal.
Range – The range is the distance between the highest and lowest score in a given population. The range is the highest score minus the lowest score. Using the above example of 2,3,3,4,4; the range would be 4 – 2 which equals 2.
Variance – The variance shows the dispersion of scores in a population. The variance is found by finding the mean, finding out how each score deviates from the mean, and then squaring the deviation. Using the above example of 2,3,3,4,4 with a mean of 3.2; the variance would be found by (2- 3.2) squared + (3-3.2) squared + (3-3.2) squared + (4-3.2) squared + (4-3.2) squared and dividing the total by the number of scores which is 5. The answer is -0.048.
Standard Deviation – The standard deviation of a population is the square root of its variance. Using the above example of 2,3,3,4,4 with a variance of -0.048; the standard deviation is 0.219089023.
Central Limit Theorem – A theory of probability that describes the characteristics of the population of the means that comes from the means of an infinite number of random population samples of the given size which are all drawn from the given population and states that the sampling distribution of means is always equal to the mean of the population from which the samples were drawn, the variance of the sampling distribution of means is equal to the variance of the population from which the samples were drawn divided by the size of the samples, and if the original population has a normal distribution that is shown by a bell shape the sampling distribution of means will also be normal.
Z-Score – The z-score indicates how far and in what direction a score deviates from the distribution mean and is expressed in units of standard deviation. The z-score equals the raw score to be standardized minus the mean of the population divided by the standard deviation of the population. Using the above example of 2,3,3,4,4 with a mean of 3.2 and a standard deviation of 0.219089023, the number 4 would have a z-score of 3.65 (rounded).
Null Hypothesis – a hypothesis that the independent variable will not have any effect on the dependent variable and that any differences between the experimental and control groups are caused by chance.
Statistical Significance – a result that is considered statistically significant is one that is unlikely to have been caused by chance factors.
Correlation – a mutual relation between two or more things. Correlation does not, however, mean causation. For example, suppose that driving red cars is correlated with a greater chance of getting a speeding ticket. Driving a red car obviously does not cause a speeding ticket.
Correlation Coefficient – a measurement of how well predicted values work with real life data described by a number from 0 to 1 with 1 being the most accurate possible.
Confounding Variable – a variable that falsely links the independent variable and the dependent variable or masks a correlation between the independent variable and the dependent variable in an experiment.