Sample Statistics

The SAMPLE MEAN (X̄) is a measure of central tendency that represents the average value of a variable in sample data.  It is calculated in the same way that population mean (μ) is calculated: by summing all the observations for a variable in the sample, and dividing by the number of observations.

The SAMPLE STANDARD DEVIATION (s) is a measure of the dispersion or spread of the values in a sample around the sample mean.  It quantifies the amount of variation or dispersion of a set of values.  It is calculated in a similar manner to population standard deviation (σ), but with one notable difference: instead of dividing by the total number of observations (N), we divide by n-1. Dividing by n-1 produces a larger value (more variation) than dividing by N alone.  The reason why we would find this appealing when working with sample data is simple and straight-forward: whenever we use a sample instead of the entire population, there is the possibility of random error being introduced to our statistical analysis; calculating the standard deviation with n-1 errs on the side of caution by assuming larger variation.

The STANDARD ERROR OF THE MEAN (s.e.) is a measure of how much the sample mean (X̄) is expected to vary from the true population mean (μ) — in other words, it tells us how precise the sample mean is as an estimate of the population mean.  The standard error of the mean is calculated by dividing the sample standard deviation by the square root of the number of observations. The standard error of the mean decreases as the sample size increases.  Mathematically, the standard error of the mean is inversely related to the square root of the sample size (n). As the standard error of the mean decreases, the margin of error and confidence intervals narrow, and the sample mean becomes a more precise and reliable estimate of the population mean.  This is tied to the central limit theorem: 

Larger samples tend to provide a better representation of the population

→ The more representative a sample is, the more normally distributed the sample data is

→ As the sample mean approaches a normal distribution, we can make more accurate and robust inferences

Beware of Outliers!

Whether outliers (i.e., extreme values) are likely to skew results in a normal distribution is based in large part on sample size.  In small samples, outliers can disproportionately affect the sample mean, the sample standard deviation, and, as a result, the standard error of the mean.  This, in turn, can lead us to make generalizations about the population parameters based on inaccurate information.  Thus, it is important to identify, investigate, and decide how to handle outliers (i.e., include, exclude, adjust/transform, or consider separately) based on their potential impact and the context of the study. 

Random Samples and Post-Stratification Weighting

With random samples, there is no guarantee that samples will perfectly represent the population when it comes to various characteristics that may be relevant to the concept we are seeking to understand, explain, and/or predict, so POST-STRATIFICATION WEIGHTING is always needed.  With post-stratification weighting, sample data are WEIGHTED (adjusted) so the sample better mirrors the overall population.  This corrects for potential biases and makes the sample more representative of the population.  

Post-stratification weighting involves the following steps:

  1. Identify STRATA, or subgroups that share a specific characteristic, such as age, race, gender, income, education level, or other socio-economic and/or demographic factors
  2. Identify the proportions of the population falling into each STRATUM (singular of “strata”)
    • NOTE: To calculate population proportions, population-level data (such as census data) must be available for the characteristics you plan to use to weight your sample data
  3. Calculate the proportion of the sample that falls into each stratum (ex: the percentage of the sample who are male)
  4. Calculate a weight for each stratum — usually, the ratio of the population proportion for a strata to the sample proportion for a strata
    • A weight of “1” means that the proportion of the sample for that stratum matches the proportion of the population
    • A weight of greater than/less than “1” means the proportion of the sample for that stratum does not match the proportion of the population
  5. Apply the weights to sample data in each stratum, which adjusts the survey data to better reflect the distribution of these characteristics in the overall population
    • When the weight associated with a stratum = 1, no adjustment is necessary because this stratum is perfectly representative of the population
    • When the weight associated with a stratum > 1, that stratum is said to have been UNDERREPRESENTED (i.e., included in the sample in lower proportions than what is found in the population); this adjusts values for this stratum so they count more heavily in the overall analysis
    • When the weight associated with a stratum < 1, that stratum is said to have been OVERREPRESENTED (i.e., included in the sample in higher proportions than what is found in the population); this adjusts values for this stratum so they count less heavily in the overall analysis

Generalizability: A Function of Representativeness and Sample Size

Generalizability relies heavily on the representativeness of the sample (i.e., the extent to which a sample’s composition mirrors that of the population).  With probability sampling, members of a population have a known chance of selection, which minimizes selection bias and ensures the sample composition accurately reflects the characteristics of the population.  With non-probability sampling, however, SAMPLING BIAS and/or SELF-SELECTION BIAS may result in a non-representative sample: the sample composition does not accurately reflect the characteristics of the population.

Generalizability also relies heavily on sample size.  Larger sample sizes “typically do a better job at capturing the characteristics present in the population than do smaller samples” (Meier, Brudney, and Bohte, 2011, p. 178).  Larger sample sizes tend to provide more reliable sample statistics and more precise estimates of population parameters.  Furthermore, larger sample sizes increase the statistical power of a study, making it easier to identify statistically significant relationships and differences.

Populations vs. Samples

“A POPULATION is the total set of items that we are concerned about” (Meier, Brudney, and Bohte, 2011,  p. 173).  In other words, the population is the complete set of individuals or items that share a common characteristic or set of characteristics.  We are often interested in population PARAMETERS, i.e., numerical values that are fixed and describe a characteristic of a population, such as the population mean (μ), variance (σ²), and standard deviation (σ).  

“A SAMPLE is a subset of a population” (Meier, Brudney, and Bohte, 2011, p. 173).  There are two different types of samples: probability samples and non-probability samples. In a PROBABILITY SAMPLE, all members of the population have a KNOWN CHANCE of being selected as part of the sample.  To construct a probability sample, you will need to obtain a list of the entire population; this list then serves as the SAMPLING FRAME from which the sample will be selected/drawn. An example of a probability sample is a RANDOM SAMPLE, in which all members of the population have an equal chance of being selected in a sample. In a NON-PROBABILITY SAMPLE, some members of the population have NO CHANCE of being selected as part of the sample (in other words, the probability of selection cannot be determined). An example of a non-probability sample is a CONVENIENCE SAMPLE, in which the sample is selected based on convenience (i.e., as a result of being easy to contact or reach).

“A STATISTIC is a measure that is used to summarize a sample” (Meier, Brudney, and Bohte, 2011, p. 173), such as the measures of central tendency (ex: sample mean, X̄)and dispersion (ex: sample standard deviation, s) for a variable.  In order to treat sample findings as GENERALIZABLE to the population (i.e., use sample statistics as reliable estimates of the population parameters), the sample should to be a probability sample. 

Why Are Only Probability Samples Generalizable?

Probability samples are more representative of the population.  Furthermore, in probability sampling, the sampling distribution of the sample statistic (e.g., sample mean) can be determined based on statistical principles.  Thus, probability sampling allows us to calculate measures such as margins of error and confidence levels, which account for uncertainty in our sample statistics and capture how reliably they estimate population parameters. 

In contrast, non-probability samples lack a clear and defined sampling distribution, making it impossible to accurately estimate the variability of the sample statistic.

Inferential Statistics: The Basics

INFERENTIAL STATISTICS are “quantitative techniques [that can be used] to generalize from a sample to a population” (Meier, Brudney, and Bohte, 2011, p. 173).  When done correctly and with a large enough sample, the results obtained from a sample can be generalized to the population from which the sample was taken, with a known MARGIN OF ERROR that provides a range around a sample estimate within which the true population parameter is expected to lie (once this range is added to our point estimate, we call it a CONFIDENCE INTERVAL) and a CONFIDENCE LEVEL that indicates the probability that the population parameter falls within this interval.  Margin of error, confidence intervals, and confidence levels help quantify how precise our estimates are.

For example, every time you hear a news broadcast report the President Biden’s job approval rating, you are receiving inferences based on a sample of the population.  Naturally, it would be too costly and take too long to contact everyone in the United States to ask them how well Biden is doing as president.  Instead, a random sample of Americans is used to generate Biden’s job approval rating.  Then, depending on the sample size, the MOE is calculated; this accounts for variability in their estimates that results from not asking every American how well Biden is doing.  If CNN reports that Biden’s job approval is 44% with a ±3 MOE with a confidence level of 95%, we are 95% confident that the true job approval rating lies between 41% and 47%.