Standardization, Z-Scores, and the Z-Table

Standard deviations give us aggregate information but not individual information: although standard deviations can give us parameters with which we can calculate how all values of a variable cluster around the mean value, they do not give us an indication of how closely a particular score does.  This is where standardization, z-scores, and the standard normal distribution table are beneficial.

Standardization and the Standard Normal Distribution 

STANDARDIZATION is the process of transforming data into a STANDARD NORMAL DISTRIBUTION, which is a special normal distribution with a mean of 0 and a standard deviation of 1: Z ~ N(0,1).  Standardization allows for comparison between datasets or variables with different units or scales.  For example, if you want to directly compare SAT and ACT scores (which are based on different scales),  you can standardize the data; this puts the scores on the same scale, allowing direct comparisons.  Standardization also allows us to more easily calculate the probability of observing a specific value for a given variable.

Z-Scores

Z-scores (i.e., standard scores) are the result of standardization; they put individual scores into context.  “A Z-SCORE is simply the number of standard deviations a score of interest lies from the mean of a [standard] normal distribution” (Meier, Brudney, and Bohte, 2011, p. 134).

Using a Standard Normal Distribution Table

Once you have standardized your variable(s) and calculated z-scores for the values of interest, you can use the STANDARD NORMAL DISTRIBUTION TABLE (i.e., Z-TABLE) to determine a value’s probability.  Normal distribution tables can also be used to find p-values for z-tests.

Below are some tips for reading a standard normal distribution table:

  • Round the z-score to the nearest hundredth
  • Familiarize yourself with the layout of the standard normal distribution table:
  • Row and column headers define the z-score
    • Read down the first column for the ones and tenths places of your number
    • Read along the top row for the hundredths place
  • Table cells represent the area under the curve to the left of a z-score
  • To locate the probability of a variable taking on a certain value:
    • Split the z-score into a number to the nearest tenth and one to the nearest hundredth
    • The intersection of the row from the first part and the column from the second part will give you the value associated with your z-score
    • This value represents the proportion of the data set that lies below the value corresponding to your z-score in a standard normal distribution
      • For example, the cumulative probability for z-score=1.23 is 0.8907, which means that there is an 89.07% chance that a randomly selected value from a standard normal distribution is less than 1.23
  • Calculating the difference between the area under the curve for two values/data points tells you the probability of variables taking on a range of values

The Normal Distribution: The Basics

The NORMAL DISTRIBUTION (sometimes referred to as the Gaussian distribution) is a continuous probability distribution that can be found in many places: height, weight, IQ scores, test scores, errors, reaction times, etc.  Understanding that a variable is normally distributed allows you to:

  • predict the likelihood (i.e., probability) of observing certain values
  • apply various statistical techniques that assume normality
  • establish confidence intervals and conduct hypothesis tests

Characteristics of the Normal Distribution

There are several key characteristics of the normal distribution:

  • mean, median, and mode are equal and located at the center of the distribution
  • the distribution is symmetric about the mean (i.e., the left half of the distribution is a mirror image of the right half); “Scores above and below the mean are equally likely to occur so that half of the probability under the curve (0.5) lies above the mean and half (0.5) below” (Meier, Brudney, and Bohte, 2011, p. 132)
  • the distribution resembles a bell-shaped curve (i.e., highest at the mean and tapers off towards the tails)
  • the standard deviation determines the SPREAD of the distribution (i.e., its height and width): a smaller standard deviation results in a steeper curve, while a larger standard deviation results in a flatter curve
  •  the 68-95-99 RULE can be used to summarize the distribution and calculate probabilities of event occurrence:
    – approximately 68% of the data falls within ±1 standard deviation of the mean
    – approximately 95% of the data falls within ±2 standard deviations of the mean
    – approximately 99% of the data falls within ±3 standard deviations of the mean
  • there is always a chance that values will fall outside ±3 standard deviations of the mean, but the probability of occurrence is less than 1%
  • the tails of the distribution never touch the horizontal axis: the probability of an outlier occurring may be unlikely, but it is always possible; thus, the upper and lower tails approach, but never reach, 0%

Why the Normal Distribution is Common in Nature: The Central Limit Theorem

The CENTRAL LIMIT THEOREM states that the distribution of sample means for INDEPENDENT, IDENTICALLY DISTRIBUTED (IID) random variables will approximate a normal distribution, even when the variables themselves are not normally distributed, assuming the sample is large enough.  Thus, as long as you have a sufficiently large random sample, we can make inferences about the population parameters (what we are interested in) from sample statistics (what we often are working with).

What Does “IID” Mean?

Variables are considered independent if they are mutually exclusive.  Variables are considered identically distributed if they have the same probability distribution (i.e., normal, Poisson, etc.)

Do Outliers Matter?

In a normal distribution based on a large number of observations, it is unlikely that outliers will skew results.  If you are working with data involving fewer observations, outliers are more likely to skew results; in these situations, you should identify, invest, and decide how to handle outliers.

Example of a Normal Distribution: IQ Tests

Because the IQ test has been given millions of times, IQ scores represent a normal probability distribution.  On the IQ test, the mean, median, and mode are equal and fall in the middle of the distribution (100).  The standard deviation on the IQ test is 15; applying the 68-95-99 rule, we can say with reasonable certainty:

  • 68% of the population will score between 85 and 115, or ±1 standard deviation from the mean
  • 95% of the population will score between 70 and 130, or ±2 standard deviations from the mean
  • 99% of the population will score between 55 and 145, or ±3 standard deviations from the mean

Rarely will you encounter such a perfect normal probability distribution as the IQ test, but we can calculate z-scores to standardize (i.e., “normalize”) values for distributions that aren’t as normal as the IQ distribution.