Populations vs. Samples

“A POPULATION is the total set of items that we are concerned about” (Meier, Brudney, and Bohte, 2011,  p. 173).  In other words, the population is the complete set of individuals or items that share a common characteristic or set of characteristics.  We are often interested in population PARAMETERS, i.e., numerical values that are fixed and describe a characteristic of a population, such as the population mean (μ), variance (σ²), and standard deviation (σ).  

“A SAMPLE is a subset of a population” (Meier, Brudney, and Bohte, 2011, p. 173).  There are two different types of samples: probability samples and non-probability samples. In a PROBABILITY SAMPLE, all members of the population have a KNOWN CHANCE of being selected as part of the sample.  To construct a probability sample, you will need to obtain a list of the entire population; this list then serves as the SAMPLING FRAME from which the sample will be selected/drawn. An example of a probability sample is a RANDOM SAMPLE, in which all members of the population have an equal chance of being selected in a sample. In a NON-PROBABILITY SAMPLE, some members of the population have NO CHANCE of being selected as part of the sample (in other words, the probability of selection cannot be determined). An example of a non-probability sample is a CONVENIENCE SAMPLE, in which the sample is selected based on convenience (i.e., as a result of being easy to contact or reach).

“A STATISTIC is a measure that is used to summarize a sample” (Meier, Brudney, and Bohte, 2011, p. 173), such as the measures of central tendency (ex: sample mean, X̄)and dispersion (ex: sample standard deviation, s) for a variable.  In order to treat sample findings as GENERALIZABLE to the population (i.e., use sample statistics as reliable estimates of the population parameters), the sample should to be a probability sample. 

Why Are Only Probability Samples Generalizable?

Probability samples are more representative of the population.  Furthermore, in probability sampling, the sampling distribution of the sample statistic (e.g., sample mean) can be determined based on statistical principles.  Thus, probability sampling allows us to calculate measures such as margins of error and confidence levels, which account for uncertainty in our sample statistics and capture how reliably they estimate population parameters. 

In contrast, non-probability samples lack a clear and defined sampling distribution, making it impossible to accurately estimate the variability of the sample statistic.

Inferential Statistics: The Basics

INFERENTIAL STATISTICS are “quantitative techniques [that can be used] to generalize from a sample to a population” (Meier, Brudney, and Bohte, 2011, p. 173).  When done correctly and with a large enough sample, the results obtained from a sample can be generalized to the population from which the sample was taken, with a known MARGIN OF ERROR that provides a range around a sample estimate within which the true population parameter is expected to lie (once this range is added to our point estimate, we call it a CONFIDENCE INTERVAL) and a CONFIDENCE LEVEL that indicates the probability that the population parameter falls within this interval.  Margin of error, confidence intervals, and confidence levels help quantify how precise our estimates are.

For example, every time you hear a news broadcast report the President Biden’s job approval rating, you are receiving inferences based on a sample of the population.  Naturally, it would be too costly and take too long to contact everyone in the United States to ask them how well Biden is doing as president.  Instead, a random sample of Americans is used to generate Biden’s job approval rating.  Then, depending on the sample size, the MOE is calculated; this accounts for variability in their estimates that results from not asking every American how well Biden is doing.  If CNN reports that Biden’s job approval is 44% with a ±3 MOE with a confidence level of 95%, we are 95% confident that the true job approval rating lies between 41% and 47%.