Scale/Index Variables: A Measurement Technique Based on Z-Scores  

“A SCALE or INDEX is a composite measure combining several variables [or items] into a single unified measure of a concept” (Meier, Brudney, and Bohte, 2011, p. 144).  Scale/index variables are useful for several reasons:

  1. They can make analyzing a concept less complicated by reducing the number of variables
  2. They allow for more detailed analysis by
    • providing a more reliable and comprehensive measure of the underlying concept than any single item could provide on its own
    • transforming nominal level data into interval/ratio level data through summation (as the “scale” term indicates)
  3. They provide a clearer interpretation of the data by summarizing the information from multiple items into single scores, making it easier to communicate findings

If two or more variables are measured along the same scale (for instance, binary dummy variables), the values for these variables for each observation can simply be adding together to create a SUMMATIVE SCALE/INDEX variable.  If we have three such variables, the resulting summative scale/index variable will range from 0 to 3, use real numbers, have equal intervals between categories, and have an absolute zero point.  These characteristics describe a ratio-level variable.  You could also divide the range of the summative scale/index variable by the number of variables (for this example, 3/3) to transform it to a MEAN SCALE/INDEX variable that captures the average of across all three variables, using the original scale (for this example, ranging from 0 to 1). 

If the variables you want to use when constructing a scale/index variable are measured along different scales, you would first standardize your variables so they are measured using the same scale (specifically, both following a standard normal distribution).  Then, the standardized values (i.e., z-scores) for these variables for each observation can simply be adding together to create a summative scale/index variable with the characteristics of a ratio-level variable.

Standardization, Z-Scores, and the Z-Table

Standard deviations give us aggregate information but not individual information: although standard deviations can give us parameters with which we can calculate how all values of a variable cluster around the mean value, they do not give us an indication of how closely a particular score does.  This is where standardization, z-scores, and the standard normal distribution table are beneficial.

Standardization and the Standard Normal Distribution 

STANDARDIZATION is the process of transforming data into a STANDARD NORMAL DISTRIBUTION, which is a special normal distribution with a mean of 0 and a standard deviation of 1: Z ~ N(0,1).  Standardization allows for comparison between datasets or variables with different units or scales.  For example, if you want to directly compare SAT and ACT scores (which are based on different scales),  you can standardize the data; this puts the scores on the same scale, allowing direct comparisons.  Standardization also allows us to more easily calculate the probability of observing a specific value for a given variable.

Z-Scores

Z-scores (i.e., standard scores) are the result of standardization; they put individual scores into context.  “A Z-SCORE is simply the number of standard deviations a score of interest lies from the mean of a [standard] normal distribution” (Meier, Brudney, and Bohte, 2011, p. 134).

Using a Standard Normal Distribution Table

Once you have standardized your variable(s) and calculated z-scores for the values of interest, you can use the STANDARD NORMAL DISTRIBUTION TABLE (i.e., Z-TABLE) to determine a value’s probability.  Normal distribution tables can also be used to find p-values for z-tests.

Below are some tips for reading a standard normal distribution table:

  • Round the z-score to the nearest hundredth
  • Familiarize yourself with the layout of the standard normal distribution table:
  • Row and column headers define the z-score
    • Read down the first column for the ones and tenths places of your number
    • Read along the top row for the hundredths place
  • Table cells represent the area under the curve to the left of a z-score
  • To locate the probability of a variable taking on a certain value:
    • Split the z-score into a number to the nearest tenth and one to the nearest hundredth
    • The intersection of the row from the first part and the column from the second part will give you the value associated with your z-score
    • This value represents the proportion of the data set that lies below the value corresponding to your z-score in a standard normal distribution
      • For example, the cumulative probability for z-score=1.23 is 0.8907, which means that there is an 89.07% chance that a randomly selected value from a standard normal distribution is less than 1.23
  • Calculating the difference between the area under the curve for two values/data points tells you the probability of variables taking on a range of values