Sample Statistics – Texas Political Science

The SAMPLE MEAN (X̄) is a measure of central tendency that represents the average value of a variable in sample data. It is calculated in the same way that population mean (μ) is calculated: by summing all the observations for a variable in the sample, and dividing by the number of observations.

The SAMPLE STANDARD DEVIATION (s) is a measure of the dispersion or spread of the values in a sample around the sample mean. It quantifies the amount of variation or dispersion of a set of values. It is calculated in a similar manner to population standard deviation (σ), but with one notable difference: instead of dividing by the total number of observations (N), we divide by n-1. Dividing by n-1 produces a larger value (more variation) than dividing by N alone. The reason why we would find this appealing when working with sample data is simple and straight-forward: whenever we use a sample instead of the entire population, there is the possibility of random error being introduced to our statistical analysis; calculating the standard deviation with n-1 errs on the side of caution by assuming larger variation.

The STANDARD ERROR OF THE MEAN (s.e.) is a measure of how much the sample mean (X̄) is expected to vary from the true population mean (μ) — in other words, it tells us how precise the sample mean is as an estimate of the population mean. The standard error of the mean is calculated by dividing the sample standard deviation by the square root of the number of observations. The standard error of the mean decreases as the sample size increases. Mathematically, the standard error of the mean is inversely related to the square root of the sample size (n). As the standard error of the mean decreases, the margin of error and confidence intervals narrow, and the sample mean becomes a more precise and reliable estimate of the population mean. This is tied to the central limit theorem:

Larger samples tend to provide a better representation of the population

→ The more representative a sample is, the more normally distributed the sample data is

→ As the sample mean approaches a normal distribution, we can make more accurate and robust inferences

Beware of Outliers!

Whether outliers (i.e., extreme values) are likely to skew results in a normal distribution is based in large part on sample size. In small samples, outliers can disproportionately affect the sample mean, the sample standard deviation, and, as a result, the standard error of the mean. This, in turn, can lead us to make generalizations about the population parameters based on inaccurate information. Thus, it is important to identify, investigate, and decide how to handle outliers (i.e., include, exclude, adjust/transform, or consider separately) based on their potential impact and the context of the study.