Measures of DISPERSION tells us about how much the observations cluster around the expected value (i.e., the”typical” or average value) for a variable. In other words, measures of dispersion tell us about the SPREAD of a distribution of values and the overall VARIATION in a measure. This information can be used to understand the distribution of their data, identify the range of values associated with their data, and determine how much confidence we can have in our expected values.
Min, Max, Range, IQR, Variance, & Standard Deviation
There are six measures of dispersion:
- MIN and MAX – the minimum (lowest) and maximum (highest) values of a variable
- RANGE – the difference between the maximum and minimum values of a variable; as a formula: Range = Max – Min
- INTERQUARTILE RANGE (IQR) — the difference between the first quartile (Q1 / 25%) and the third quartile (Q3 / 75%), which corresponds to the range of the middle 50% of values of a variable; as a formula: IQR = Q3 – Q1
- VARIANCE — the average squared deviation of each value from the mean (i.e., the sum of all values of a variable minus the mean, squared, and divided by the number of cases in the variable)
- STANDARD DEVIATION — the average distance of each value from the mean, expressed in same units as the data (i.e., the square root of the variance)
As is the case with measures of central tendency, we cannot calculate all measures of dispersion on all levels of variables. Range and IQR require rank ordering of values — which, in turn, requires that the variable has direction. Variance and standard deviation can only be calculated if values are associated with real numbers that have equal intervals of measurement between them. Recall that the hierarchy of measurement illustrates that any statistic that can be calculated for a lower level of measurement can be legitimately calculated and used for higher levels of measurement. Therefore:
- because min and max can be calculated for nominal level variables, they can also be calculated on ordinal, interval, and ratio variables
- because range and IQR can be calculated for ordinal variables, they can also be calculated on interval and ratio variables
- because variance and standard deviation can be calculated for interval variables, they can also be calculated for ratio variables
Measure | Description | Levels of Measurement |
---|---|---|
MIN/MAX | Minimum and maximum values of a variable | Nominal + Ordinal + Interval + Ratio |
RANGE | Difference between the maximum and minimum values of a variable | Ordinal + Interval + Ratio |
IQR | Range of the middle 50% of values of a variable | Ordinal + Interval + Ratio |
VARIANCE | Average squared deviation of each value of a variable from the mean | Interval + Ratio |
STANDARD DEVIATION | Average distance of value of a variable from the mean, expressed in same units as the data; square root of the variance | Interval + Ratio |
Variance vs. Standard Deviation
Variance and standard deviation are measures that capture the same information (hence, the standard deviation is simply the square root of the variance). Does it matter which measure we report? In fact, it does!
Generally speaking, standard deviation is more useful than the variance from an interpretation standpoint because it is in the same units as the original data (unlike variance, which is expressed in squared units of the original data). This makes standard deviation easier to understand and communicate. Thus, standard deviation allows for direct comparisons and provides a clearer picture of data spread. This is especially true within the context of normal distributions.
For example, let’s assume we have a dataset of annual incomes and derive the following measures of central tendency and dispersion:
- Mean income: 50,000 (dollars)
- Standard deviation: 10,000 (dollars)
- Variance: 100,000,000 (square dollars)
Interpreting the standard deviation is pretty straight-forward: most people’s incomes are within $10,000 of the average income of $50,000. Interpreting variance, however, is more tricky: the average squared deviation from the mean income is $100,000,000. As you can see, the interpretation of variance is less directly meaningful without further mathematical manipulation (i.e., taking the square root to find the standard deviation).
Why Does Variance Use Square Units?
When calculating the mean deviation by summing the differences between each data point and the mean, the positive and negative differences (associated with values that fall above and below the mean) can cancel each other out, resulting in a sum of zero. Variance uses square units to ensure all deviations from the mean are positive, which in turn prevents positive and negative differences from cancelling out.
Working with square units also has some useful mathematical properties.