Statistical Significance: How Sure Should a Person Be?

When researchers say that their results are STATISTICALLY SIGNIFICANT, they mean that the observed effect or relationship in the data is unlikely to have occurred by chance alone.  Thus, statistical significance tells us whether the sample results we observe are strong enough to reject the null hypothesis, according to a predefined threshold (i.e., the significance level α). 

Statistically significant results obtained from a probability sample can be generalized to the population from which the sample was taken.  For example, if I find a statistically significant relationship between voting and sex (women are more likely to vote than men) in a random sample of Americans using a threshold of α=0.05, I can conclude with 95% confidence that this relationship exists in the United States: throughout the entire country, women are more likely to vote than men.

Choosing a Significance Level     

What significance level (α) should we use to determine whether results are statistically significant? 

In social science research, the significance level at which results are considered statistically significant is usually α=0.05, meaning we are 95% confident that the relationship between two variables is real (i.e., not the result of random chance).  However, in some social science research (such as some areas of political behavior research), α=0.10 is used to identify whether results are statistically significant, meaning we are 90% confident that the relationship between two variables is real.  If you think about it, choosing a lower significance level here makes sense: human behavior is only so predictable.  Thus, the choice of significance level is sometimes driven by the concept being researched.

Sample size can also effect the choice of significance level.  As sample size increases, the standard error decreases, which leads to more precise estimates of the population parameters and makes it easier to detect smaller effects.  Therefore, larger sample sizes increase the likelihood of detecting statistically significant effects with a smaller α, relative to smaller sample sizes.  However, there is a trade-off when it comes to sample sizes: larger samples are more costly than smaller samples in terms of the time required to recruit the sample and collect data and the financial cost associated with survey administration and data collection.  Time and money are finite resources: researchers and administrators only have so much time, and so much money, that can be dedicated to a given project.  As such, we sometimes adjust our level of significance to accommodate our sample size, opting to proceed with a smaller sample size (n) and larger significance level (α). 

For instance, when researching crime rates using data using a sample of 50 cities and townships (n=50), you may decide to adopt a threshold of α=0.10 to increase the likelihood of finding statistically significant effects.  If, on the other hand, the sample consists of 2,500 cities and townships (n=2500), you may decide to adopt a threshold of α=0.01.  As long as you decide on a level of significance before conducing statistical analysis, and accurately report α alongside the results, either of these is perfectly acceptable.  You cannot adjust α after conducting statistical analysis to accommodate the results.

Determining Sample Size for a Significance Level

At times, you may decide that you want to report results at a specific significance level and then let α drive the decision regarding how large of a sample you will need to detect a relationship.  In these situations, “the ideal sample size for any problem is a function of (1) the amount of error that can be tolerated, (2) the confidence one wants to have in the error estimate, and (3) the standard deviation of the population” (Meier, Brudney, and Bohte, 2011, p. 203).  Specifically, the sample size should be equal to squared value obtained when the critical value (i.e., t-score) associated with the desired α is multiplied by the estimated sample standard deviation, and then divided by the maximum margin of error that can be tolerated