The Normal Distribution: The Basics

The NORMAL DISTRIBUTION (sometimes referred to as the Gaussian distribution) is a continuous probability distribution that can be found in many places: height, weight, IQ scores, test scores, errors, reaction times, etc.  Understanding that a variable is normally distributed allows you to:

  • predict the likelihood (i.e., probability) of observing certain values
  • apply various statistical techniques that assume normality
  • establish confidence intervals and conduct hypothesis tests

Characteristics of the Normal Distribution

There are several key characteristics of the normal distribution:

  • mean, median, and mode are equal and located at the center of the distribution
  • the distribution is symmetric about the mean (i.e., the left half of the distribution is a mirror image of the right half); “Scores above and below the mean are equally likely to occur so that half of the probability under the curve (0.5) lies above the mean and half (0.5) below” (Meier, Brudney, and Bohte, 2011, p. 132)
  • the distribution resembles a bell-shaped curve (i.e., highest at the mean and tapers off towards the tails)
  • the standard deviation determines the SPREAD of the distribution (i.e., its height and width): a smaller standard deviation results in a steeper curve, while a larger standard deviation results in a flatter curve
  •  the 68-95-99 RULE can be used to summarize the distribution and calculate probabilities of event occurrence:
    – approximately 68% of the data falls within ±1 standard deviation of the mean
    – approximately 95% of the data falls within ±2 standard deviations of the mean
    – approximately 99% of the data falls within ±3 standard deviations of the mean
  • there is always a chance that values will fall outside ±3 standard deviations of the mean, but the probability of occurrence is less than 1%
  • the tails of the distribution never touch the horizontal axis: the probability of an outlier occurring may be unlikely, but it is always possible; thus, the upper and lower tails approach, but never reach, 0%

Why the Normal Distribution is Common in Nature: The Central Limit Theorem

The CENTRAL LIMIT THEOREM states that the distribution of sample means for INDEPENDENT, IDENTICALLY DISTRIBUTED (IID) random variables will approximate a normal distribution, even when the variables themselves are not normally distributed, assuming the sample is large enough.  Thus, as long as you have a sufficiently large random sample, we can make inferences about the population parameters (what we are interested in) from sample statistics (what we often are working with).

What Does “IID” Mean?

Variables are considered independent if they are mutually exclusive.  Variables are considered identically distributed if they have the same probability distribution (i.e., normal, Poisson, etc.)

Do Outliers Matter?

In a normal distribution based on a large number of observations, it is unlikely that outliers will skew results.  If you are working with data involving fewer observations, outliers are more likely to skew results; in these situations, you should identify, invest, and decide how to handle outliers.

Example of a Normal Distribution: IQ Tests

Because the IQ test has been given millions of times, IQ scores represent a normal probability distribution.  On the IQ test, the mean, median, and mode are equal and fall in the middle of the distribution (100).  The standard deviation on the IQ test is 15; applying the 68-95-99 rule, we can say with reasonable certainty:

  • 68% of the population will score between 85 and 115, or ±1 standard deviation from the mean
  • 95% of the population will score between 70 and 130, or ±2 standard deviations from the mean
  • 99% of the population will score between 55 and 145, or ±3 standard deviations from the mean

Rarely will you encounter such a perfect normal probability distribution as the IQ test, but we can calculate z-scores to standardize (i.e., “normalize”) values for distributions that aren’t as normal as the IQ distribution.

Measures of Dispersion

Measures of DISPERSION tells us about how much the observations cluster around the expected value (i.e., the”typical” or average value) for a variable.  In other words, measures of dispersion tell us about the SPREAD of a distribution of values and the overall VARIATION in a measure.  This information can be used to understand the distribution of their data, identify the range of values associated with their data, and determine how much confidence we can have in our expected values.  

Min, Max, Range, IQR, Variance, & Standard Deviation

There are six measures of dispersion:

  • MIN and MAX – the minimum (lowest) and maximum (highest) values of a variable
  • RANGE – the difference between the maximum and minimum values of a variable; as a formula: Range = Max – Min
  • INTERQUARTILE RANGE (IQR) — the difference between the first quartile (Q1 / 25%) and the third quartile (Q3 / 75%), which corresponds to the range of the middle 50% of values of a variable; as a formula:  IQR = Q3 – Q1
  • VARIANCE — the average squared deviation of each value from the mean (i.e., the sum of all values of a variable minus the mean, squared, and divided by the number of cases in the variable)
  • STANDARD DEVIATION — the average distance of each value from the mean, expressed in same units as the data (i.e., the square root of the variance)

As is the case with measures of central tendency, we cannot calculate all measures of dispersion on all levels of variables.  Range and IQR require rank ordering of values — which, in turn, requires that the variable has direction.  Variance and standard deviation can only be calculated if values are associated with real numbers that have equal intervals of measurement between them.  Recall that the hierarchy of measurement illustrates that any statistic that can be calculated for a lower level of measurement can be legitimately calculated and used for higher levels of measurement.  Therefore:

  • because min and max can be calculated for nominal level variables, they can also be calculated on ordinal, interval, and ratio variables
  • because range and IQR can be calculated for ordinal variables, they can also be calculated on interval and ratio variables
  • because variance and standard deviation can be calculated for interval variables, they can also be calculated for ratio variables 
MeasureDescriptionLevels of Measurement
MIN/MAXMinimum and maximum values of a variableNominal + Ordinal + Interval + Ratio
RANGEDifference between the maximum and minimum values of a variable Ordinal + Interval + Ratio
IQRRange of the middle 50% of values of a variableOrdinal + Interval + Ratio
VARIANCEAverage squared deviation of each value of a variable from the meanInterval + Ratio
STANDARD DEVIATIONAverage distance of value of a variable from the mean, expressed in same units as the data; square root of the varianceInterval + Ratio
Measures of Dispersion

Variance vs. Standard Deviation

Variance and standard deviation are measures that capture the same information (hence, the standard deviation is simply the square root of the variance).  Does it matter which measure we report?  In fact, it does! 

Generally speaking, standard deviation is more useful than the variance from an interpretation standpoint because it is in the same units as the original data (unlike variance, which is expressed in squared units of the original data).  This makes standard deviation easier to understand and communicate.  Thus, standard deviation allows for direct comparisons and provides a clearer picture of data spread.  This is especially true within the context of normal distributions.

For example, let’s assume we have a dataset of annual incomes and derive the following measures of central tendency and dispersion:

  • Mean income: 50,000 (dollars)
  • Standard deviation: 10,000 (dollars)
  • Variance: 100,000,000 (square dollars)

Interpreting the standard deviation is pretty straight-forward: most people’s incomes are within $10,000 of the average income of $50,000.  Interpreting variance, however, is more tricky: the average squared deviation from the mean income is $100,000,000.  As you can see, the interpretation of variance is less directly meaningful without further mathematical manipulation (i.e., taking the square root to find the standard deviation).

Why Does Variance Use Square Units?

When calculating the mean deviation by summing the differences between each data point and the mean, the positive and negative differences (associated with values that fall above and below the mean) can cancel each other out, resulting in a sum of zero.  Variance uses square units to ensure all deviations from the mean are positive, which in turn prevents positive and negative differences from cancelling out.

Working with square units also has some useful mathematical properties.