The t-Distribution

The T-DISTRIBUTION is a probability distribution that has a mean of 0 and is symmetrical and bell-shaped, similar to the normal distribution, but with heavier tails.  The t-distribution provides a more accurate and conservative estimates of population parameters when dealing with small samples (n<30) or when population standard deviations are unknown (which is usually the case in social science research).

The shape of the t-distribution — how tall/short the center of the distribution is and how thin/thick the tails of the distribution are (i.e., the dispersion of the distribution) — is determined by the DEGREES OF FREEDOM (df).  The degrees of freedom for a single sample is equal to the sample size, minus one; as a formula: df=n-1.  As degrees of freedom increase, the t-distribution approaches the normal distribution.

To interpret a t-distribution, you will need to reference a T-DISTRIBUTION TABLE (i.e., a T-TABLE).  Using a t-table is similar to using a z-table:

  • Rows correspond to different degrees of freedom 
  • Columns correspond to different confidence levels (90%, 95%, 99%) or SIGNIFICANCE LEVELS (α), which are equal to 1 minus the confidence level (α = 0.10, 0.05, 0.01)
  • Table cells report the CRITICAL VALUES of the t-distribution, given the degrees of freedom and the confidence level/significance level; critical values are helpful in hypothesis testing and determining confidence intervals

Sample Statistics

The SAMPLE MEAN (X̄) is a measure of central tendency that represents the average value of a variable in sample data.  It is calculated in the same way that population mean (μ) is calculated: by summing all the observations for a variable in the sample, and dividing by the number of observations.

The SAMPLE STANDARD DEVIATION (s) is a measure of the dispersion or spread of the values in a sample around the sample mean.  It quantifies the amount of variation or dispersion of a set of values.  It is calculated in a similar manner to population standard deviation (σ), but with one notable difference: instead of dividing by the total number of observations (N), we divide by n-1. Dividing by n-1 produces a larger value (more variation) than dividing by N alone.  The reason why we would find this appealing when working with sample data is simple and straight-forward: whenever we use a sample instead of the entire population, there is the possibility of random error being introduced to our statistical analysis; calculating the standard deviation with n-1 errs on the side of caution by assuming larger variation.

The STANDARD ERROR OF THE MEAN (s.e.) is a measure of how much the sample mean (X̄) is expected to vary from the true population mean (μ) — in other words, it tells us how precise the sample mean is as an estimate of the population mean.  The standard error of the mean is calculated by dividing the sample standard deviation by the square root of the number of observations. The standard error of the mean decreases as the sample size increases.  Mathematically, the standard error of the mean is inversely related to the square root of the sample size (n). As the standard error of the mean decreases, the margin of error and confidence intervals narrow, and the sample mean becomes a more precise and reliable estimate of the population mean.  This is tied to the central limit theorem: 

Larger samples tend to provide a better representation of the population

→ The more representative a sample is, the more normally distributed the sample data is

→ As the sample mean approaches a normal distribution, we can make more accurate and robust inferences

Beware of Outliers!

Whether outliers (i.e., extreme values) are likely to skew results in a normal distribution is based in large part on sample size.  In small samples, outliers can disproportionately affect the sample mean, the sample standard deviation, and, as a result, the standard error of the mean.  This, in turn, can lead us to make generalizations about the population parameters based on inaccurate information.  Thus, it is important to identify, investigate, and decide how to handle outliers (i.e., include, exclude, adjust/transform, or consider separately) based on their potential impact and the context of the study. 

Random Samples and Post-Stratification Weighting

With random samples, there is no guarantee that samples will perfectly represent the population when it comes to various characteristics that may be relevant to the concept we are seeking to understand, explain, and/or predict, so POST-STRATIFICATION WEIGHTING is always needed.  With post-stratification weighting, sample data are WEIGHTED (adjusted) so the sample better mirrors the overall population.  This corrects for potential biases and makes the sample more representative of the population.  

Post-stratification weighting involves the following steps:

  1. Identify STRATA, or subgroups that share a specific characteristic, such as age, race, gender, income, education level, or other socio-economic and/or demographic factors
  2. Identify the proportions of the population falling into each STRATUM (singular of “strata”)
    • NOTE: To calculate population proportions, population-level data (such as census data) must be available for the characteristics you plan to use to weight your sample data
  3. Calculate the proportion of the sample that falls into each stratum (ex: the percentage of the sample who are male)
  4. Calculate a weight for each stratum — usually, the ratio of the population proportion for a strata to the sample proportion for a strata
    • A weight of “1” means that the proportion of the sample for that stratum matches the proportion of the population
    • A weight of greater than/less than “1” means the proportion of the sample for that stratum does not match the proportion of the population
  5. Apply the weights to sample data in each stratum, which adjusts the survey data to better reflect the distribution of these characteristics in the overall population
    • When the weight associated with a stratum = 1, no adjustment is necessary because this stratum is perfectly representative of the population
    • When the weight associated with a stratum > 1, that stratum is said to have been UNDERREPRESENTED (i.e., included in the sample in lower proportions than what is found in the population); this adjusts values for this stratum so they count more heavily in the overall analysis
    • When the weight associated with a stratum < 1, that stratum is said to have been OVERREPRESENTED (i.e., included in the sample in higher proportions than what is found in the population); this adjusts values for this stratum so they count less heavily in the overall analysis

Generalizability: A Function of Representativeness and Sample Size

Generalizability relies heavily on the representativeness of the sample (i.e., the extent to which a sample’s composition mirrors that of the population).  With probability sampling, members of a population have a known chance of selection, which minimizes selection bias and ensures the sample composition accurately reflects the characteristics of the population.  With non-probability sampling, however, SAMPLING BIAS and/or SELF-SELECTION BIAS may result in a non-representative sample: the sample composition does not accurately reflect the characteristics of the population.

Generalizability also relies heavily on sample size.  Larger sample sizes “typically do a better job at capturing the characteristics present in the population than do smaller samples” (Meier, Brudney, and Bohte, 2011, p. 178).  Larger sample sizes tend to provide more reliable sample statistics and more precise estimates of population parameters.  Furthermore, larger sample sizes increase the statistical power of a study, making it easier to identify statistically significant relationships and differences.

Populations vs. Samples

“A POPULATION is the total set of items that we are concerned about” (Meier, Brudney, and Bohte, 2011,  p. 173).  In other words, the population is the complete set of individuals or items that share a common characteristic or set of characteristics.  We are often interested in population PARAMETERS, i.e., numerical values that are fixed and describe a characteristic of a population, such as the population mean (μ), variance (σ²), and standard deviation (σ).  

“A SAMPLE is a subset of a population” (Meier, Brudney, and Bohte, 2011, p. 173).  There are two different types of samples: probability samples and non-probability samples. In a PROBABILITY SAMPLE, all members of the population have a KNOWN CHANCE of being selected as part of the sample.  To construct a probability sample, you will need to obtain a list of the entire population; this list then serves as the SAMPLING FRAME from which the sample will be selected/drawn. An example of a probability sample is a RANDOM SAMPLE, in which all members of the population have an equal chance of being selected in a sample. In a NON-PROBABILITY SAMPLE, some members of the population have NO CHANCE of being selected as part of the sample (in other words, the probability of selection cannot be determined). An example of a non-probability sample is a CONVENIENCE SAMPLE, in which the sample is selected based on convenience (i.e., as a result of being easy to contact or reach).

“A STATISTIC is a measure that is used to summarize a sample” (Meier, Brudney, and Bohte, 2011, p. 173), such as the measures of central tendency (ex: sample mean, X̄)and dispersion (ex: sample standard deviation, s) for a variable.  In order to treat sample findings as GENERALIZABLE to the population (i.e., use sample statistics as reliable estimates of the population parameters), the sample should to be a probability sample. 

Why Are Only Probability Samples Generalizable?

Probability samples are more representative of the population.  Furthermore, in probability sampling, the sampling distribution of the sample statistic (e.g., sample mean) can be determined based on statistical principles.  Thus, probability sampling allows us to calculate measures such as margins of error and confidence levels, which account for uncertainty in our sample statistics and capture how reliably they estimate population parameters. 

In contrast, non-probability samples lack a clear and defined sampling distribution, making it impossible to accurately estimate the variability of the sample statistic.

Inferential Statistics: The Basics

INFERENTIAL STATISTICS are “quantitative techniques [that can be used] to generalize from a sample to a population” (Meier, Brudney, and Bohte, 2011, p. 173).  When done correctly and with a large enough sample, the results obtained from a sample can be generalized to the population from which the sample was taken, with a known MARGIN OF ERROR that provides a range around a sample estimate within which the true population parameter is expected to lie (once this range is added to our point estimate, we call it a CONFIDENCE INTERVAL) and a CONFIDENCE LEVEL that indicates the probability that the population parameter falls within this interval.  Margin of error, confidence intervals, and confidence levels help quantify how precise our estimates are.

For example, every time you hear a news broadcast report the President Biden’s job approval rating, you are receiving inferences based on a sample of the population.  Naturally, it would be too costly and take too long to contact everyone in the United States to ask them how well Biden is doing as president.  Instead, a random sample of Americans is used to generate Biden’s job approval rating.  Then, depending on the sample size, the MOE is calculated; this accounts for variability in their estimates that results from not asking every American how well Biden is doing.  If CNN reports that Biden’s job approval is 44% with a ±3 MOE with a confidence level of 95%, we are 95% confident that the true job approval rating lies between 41% and 47%.   

Scale/Index Variables: A Measurement Technique Based on Z-Scores  

“A SCALE or INDEX is a composite measure combining several variables [or items] into a single unified measure of a concept” (Meier, Brudney, and Bohte, 2011, p. 144).  Scale/index variables are useful for several reasons:

  1. They can make analyzing a concept less complicated by reducing the number of variables
  2. They allow for more detailed analysis by
    • providing a more reliable and comprehensive measure of the underlying concept than any single item could provide on its own
    • transforming nominal level data into interval/ratio level data through summation (as the “scale” term indicates)
  3. They provide a clearer interpretation of the data by summarizing the information from multiple items into single scores, making it easier to communicate findings

If two or more variables are measured along the same scale (for instance, binary dummy variables), the values for these variables for each observation can simply be adding together to create a SUMMATIVE SCALE/INDEX variable.  If we have three such variables, the resulting summative scale/index variable will range from 0 to 3, use real numbers, have equal intervals between categories, and have an absolute zero point.  These characteristics describe a ratio-level variable.  You could also divide the range of the summative scale/index variable by the number of variables (for this example, 3/3) to transform it to a MEAN SCALE/INDEX variable that captures the average of across all three variables, using the original scale (for this example, ranging from 0 to 1). 

If the variables you want to use when constructing a scale/index variable are measured along different scales, you would first standardize your variables so they are measured using the same scale (specifically, both following a standard normal distribution).  Then, the standardized values (i.e., z-scores) for these variables for each observation can simply be adding together to create a summative scale/index variable with the characteristics of a ratio-level variable.

Standardization, Z-Scores, and the Z-Table

Standard deviations give us aggregate information but not individual information: although standard deviations can give us parameters with which we can calculate how all values of a variable cluster around the mean value, they do not give us an indication of how closely a particular score does.  This is where standardization, z-scores, and the standard normal distribution table are beneficial.

Standardization and the Standard Normal Distribution 

STANDARDIZATION is the process of transforming data into a STANDARD NORMAL DISTRIBUTION, which is a special normal distribution with a mean of 0 and a standard deviation of 1: Z ~ N(0,1).  Standardization allows for comparison between datasets or variables with different units or scales.  For example, if you want to directly compare SAT and ACT scores (which are based on different scales),  you can standardize the data; this puts the scores on the same scale, allowing direct comparisons.  Standardization also allows us to more easily calculate the probability of observing a specific value for a given variable.

Z-Scores

Z-scores (i.e., standard scores) are the result of standardization; they put individual scores into context.  “A Z-SCORE is simply the number of standard deviations a score of interest lies from the mean of a [standard] normal distribution” (Meier, Brudney, and Bohte, 2011, p. 134).

Using a Standard Normal Distribution Table

Once you have standardized your variable(s) and calculated z-scores for the values of interest, you can use the STANDARD NORMAL DISTRIBUTION TABLE (i.e., Z-TABLE) to determine a value’s probability.  Normal distribution tables can also be used to find p-values for z-tests.

Below are some tips for reading a standard normal distribution table:

  • Round the z-score to the nearest hundredth
  • Familiarize yourself with the layout of the standard normal distribution table:
  • Row and column headers define the z-score
    • Read down the first column for the ones and tenths places of your number
    • Read along the top row for the hundredths place
  • Table cells represent the area under the curve to the left of a z-score
  • To locate the probability of a variable taking on a certain value:
    • Split the z-score into a number to the nearest tenth and one to the nearest hundredth
    • The intersection of the row from the first part and the column from the second part will give you the value associated with your z-score
    • This value represents the proportion of the data set that lies below the value corresponding to your z-score in a standard normal distribution
      • For example, the cumulative probability for z-score=1.23 is 0.8907, which means that there is an 89.07% chance that a randomly selected value from a standard normal distribution is less than 1.23
  • Calculating the difference between the area under the curve for two values/data points tells you the probability of variables taking on a range of values

The Normal Distribution: The Basics

The NORMAL DISTRIBUTION (sometimes referred to as the Gaussian distribution) is a continuous probability distribution that can be found in many places: height, weight, IQ scores, test scores, errors, reaction times, etc.  Understanding that a variable is normally distributed allows you to:

  • predict the likelihood (i.e., probability) of observing certain values
  • apply various statistical techniques that assume normality
  • establish confidence intervals and conduct hypothesis tests

Characteristics of the Normal Distribution

There are several key characteristics of the normal distribution:

  • mean, median, and mode are equal and located at the center of the distribution
  • the distribution is symmetric about the mean (i.e., the left half of the distribution is a mirror image of the right half); “Scores above and below the mean are equally likely to occur so that half of the probability under the curve (0.5) lies above the mean and half (0.5) below” (Meier, Brudney, and Bohte, 2011, p. 132)
  • the distribution resembles a bell-shaped curve (i.e., highest at the mean and tapers off towards the tails)
  • the standard deviation determines the SPREAD of the distribution (i.e., its height and width): a smaller standard deviation results in a steeper curve, while a larger standard deviation results in a flatter curve
  •  the 68-95-99 RULE can be used to summarize the distribution and calculate probabilities of event occurrence:
    – approximately 68% of the data falls within ±1 standard deviation of the mean
    – approximately 95% of the data falls within ±2 standard deviations of the mean
    – approximately 99% of the data falls within ±3 standard deviations of the mean
  • there is always a chance that values will fall outside ±3 standard deviations of the mean, but the probability of occurrence is less than 1%
  • the tails of the distribution never touch the horizontal axis: the probability of an outlier occurring may be unlikely, but it is always possible; thus, the upper and lower tails approach, but never reach, 0%

Why the Normal Distribution is Common in Nature: The Central Limit Theorem

The CENTRAL LIMIT THEOREM states that the distribution of sample means for INDEPENDENT, IDENTICALLY DISTRIBUTED (IID) random variables will approximate a normal distribution, even when the variables themselves are not normally distributed, assuming the sample is large enough.  Thus, as long as you have a sufficiently large random sample, we can make inferences about the population parameters (what we are interested in) from sample statistics (what we often are working with).

What Does “IID” Mean?

Variables are considered independent if they are mutually exclusive.  Variables are considered identically distributed if they have the same probability distribution (i.e., normal, Poisson, etc.)

Do Outliers Matter?

In a normal distribution based on a large number of observations, it is unlikely that outliers will skew results.  If you are working with data involving fewer observations, outliers are more likely to skew results; in these situations, you should identify, invest, and decide how to handle outliers.

Example of a Normal Distribution: IQ Tests

Because the IQ test has been given millions of times, IQ scores represent a normal probability distribution.  On the IQ test, the mean, median, and mode are equal and fall in the middle of the distribution (100).  The standard deviation on the IQ test is 15; applying the 68-95-99 rule, we can say with reasonable certainty:

  • 68% of the population will score between 85 and 115, or ±1 standard deviation from the mean
  • 95% of the population will score between 70 and 130, or ±2 standard deviations from the mean
  • 99% of the population will score between 55 and 145, or ±3 standard deviations from the mean

Rarely will you encounter such a perfect normal probability distribution as the IQ test, but we can calculate z-scores to standardize (i.e., “normalize”) values for distributions that aren’t as normal as the IQ distribution.

Probability Theory: The Basics

PROBABILITY is a branch of mathematics that deals with the likelihood or chance of different outcomes occurring in uncertain situations.  It quantifies how likely an event is to happen and is expressed as a number between 0 and 1, where 0 indicates impossibility and 1 indicates certainty. 

Probabilities are important to understand because they give us predictive capabilities: probabilities are the closest thing we have to being able to predict the future.  Of course, this is not foolproof; there is always a chance that we are wrong, which is why we couch results/findings in terms of a 95% confidence interval and margin of error. 

Basic Law of Probability

“The BASIC LAW OF PROBABILITY . . . states the following: Given that all possible outcomes of a given event are equally likely, the probability of any specific outcome is equal to the ratio of the number of ways that that outcome could be achieved to the total number of ways that all possible outcomes can be achieved” (Meier, Brudney, and Bohte, 2011, p. 113).  This means that we can predict the outcome of a specific event as long as the likelihood of a given event is known.  For example:

  • the probability of getting heads with one coin flip is 1/2
  • the probability of getting three with one roll of a six-sided dice is 1/6
  • the probability of drawing the ace of spades from a deck of cards is 1/52
  • the probability of drawing an ace from a deck of cards is 1/13 (although there are 52 cards in a deck, there are four aces; to calculate the probability of drawing an ace from a deck of cards, you would divide 52 by four)
  • the probability of drawing a heart from a deck of 52 cards is 1/4 (this is because there are four suits in each deck of cards)

All of these examples involve probabilities of the occurrence of single events.  As you can see, the process of calculating these probabilities is pretty straightforward: you divide the number of times a specific outcome can occur by the total number of possible outcomes.

Probability P(A) of a Single Event A

P (A) = Number of favorable outcomes / Total number of possible outcomes

This gets a little more complicated as we factor in other events; the manner in which we would calculate the probability of an event occurring differs depending on whether we are looking at mutually exclusive events (i.e., events that cannot occur at the same time), non-mutually exclusive events (i.e., events that can occur at the same time), independent events (i.e., events in which the occurrence of one does not affect the occurrence of the other), an event occurring given the occurrence of a different event (i.e., the conditional probability), etc.  Nevertheless, while different equations are used to calculate probabilities in these situations, the basic law of probability still exists: the probability of an event occurring falls between 0 (“never occurs”) and 1 (“always occurs”), and calculating this probability is based on possible outcomes.

A Priori and Posterior Probabilities 

A PRIORI PROBABILITIES are initial probabilities of an event based on existing knowledge, theory, or general reasoning about the event.  Everything we have discussed thus far are a priori probabilities, because we know the possible outcomes of a coin flip, die roll, or card draw.  By contrast, POSTERIOR PROBABILITIES are probabilities of an event after new evidence or information is taken into account.  A classic example of posterior probability is the Monty Hall problem.  Posterior probabilities are often calculated using BAYES’ THEOREM, which combines the prior probability with the likelihood of new evidence or information.  Frequency distributions provide the empirical data needed to estimate the probabilities used in Bayes’ theorem.