Hypotheses

Research Hypothesis

A RESEARCH HYPOTHESIS is a clear, specific, testable statement or prediction about the relationship between two or more variables (i.e., independent variable X explains variation in dependent variable Y).  The research hypothesis guides the direction of the study and outlines what the researcher expects to find.  A good hypothesis is:

  • based on existing knowledge (i.e., theory drives your predictions)
  • FALSIFIABLE (i.e., can be proven false if it is incorrect)

Some hypotheses are directional, positing that as values of one variable increase or decrease, values of the other variable increase or decrease:

  • There is a POSITIVE RELATIONSHIP between population density (independent variable) and crime rates (dependent variable) — as population density increases, crime rates increase
  • There is a NEGATIVE RELATIONSHIP between education level (independent variable) and teen pregnancy (dependent variable) — as education increases, teen pregnancy decreases

These types of hypothesis are appropriate when working with variables that have direction (i.e., ordinal-, interval- or ratio-level variables).

Some hypotheses merely posit that there is a relationship between two variables:

  • Women are more likely to vote than men (respondent sex is the independent variable) to vote (dependent variable) — there is a relationship between sex and voting

This type of hypothesis is common when working with nominal variables.  Because nominal variables have no direction, the relationship between a nominal variable and another variable cannot be stated in directional terms. Instead, the hypothesis should specify the type of relationship between the variables in terms of how differences in the dependent variable are linked with differences in the independent variable.

If, based on theory and existing knowledge, there are control variables that further explain variation in the dependent variable, they should be included the research hypothesis: 

  • if all variables involved have direction (i.e., ordinal-, interval-, or ratio-level variables), you would simply add the phrase “while controlling for” and then specify the control variables
  • if the control variable is nominal, you should specify the expected relationship

The research hypothesis is presented as H1.  If you have more than one research hypothesis, they would be presented as H1, H2, H3, etc.

Null Hypothesis

Null means having no value.  By extension, the NULL HYPOTHESIS is a statement that there is no relationship between the variables being studied. While the research hypothesis serves as a foundation for conducting empirical research by guiding the direction of the study and outlining what the researcher or administrator expects to find, the null hypothesis forms the basis of hypothesis testing.  The null hypothesis is presented as H0.

Alternative Hypothesis

An ALTERNATIVE HYPOTHESIS is proposed as an alternative to the null hypothesis; it indicates that there is a relationship between the variables being studied. Alternative hypotheses can be:

  • directional (one-tailed), specifying a direction of the effect or difference
  • non-directional (two-tailed), merely stating that there is a difference, without specifying the direction

Research hypotheses and alternative hypotheses are conceptually similar but serve different functions: a research hypothesis’s broader context is more aligned with scientific inquiry and theory testing, whereas an alternative hypothesis is specifically formulated for statistical testing against the null hypothesis. The alternative hypothesis is presented as Ha.

Skewed Probability Distributions

In a normal probability distribution, the mean, median, and mode are equal and fall at the center of the distribution, with the values symmetrically distributed around the mean.  Skewed probability distributions are distributions where the values are not symmetrically distributed around the mean.   

If a distribution is POSITIVELY SKEWED (i.e., RIGHT-SKEWED), the mean and median are greater than (i.e., fall to the right of) the mode.  This produces a distribution in which the tail on the right side is longer or fatter than the tail on the left side.  

If a distribution is NEGATIVELY SKEWED (i.e., LEFT-SKEWED), the mean and median are less than (i.e., fall to the left of) the mode.  This produces a distribution in which the tail on the left side is longer or fatter than the tail on the right side. 

Skewness can affect the interpretation of the mean, median, and mode.  For instance, in a positively skewed distribution, the mean will be higher than the median, which might give a misleading impression of the typical value if one only considers the mean.  Furthermore, many statistical techniques assume that the data follows a normal distribution (i.e., no skewness).  When data is skewed, these assumptions are violated, which can lead to errors in statistical inference.

Confidence Intervals

Confidence intervals provide a useful way to convey the uncertainty and reliability of an estimate, allowing researchers to make more informed conclusions about the population parameter.

When constructing a confidence interval for a sample mean, the critical value (t) for a given confidence level and number of degrees of freedom is multiplied by the standard error of the mean; this gives us the margin of error, which can then be added to and subtracted from the sample mean to establish the upper and lower bounds of our confidence interval. 

The t-Distribution

The T-DISTRIBUTION is a probability distribution that has a mean of 0 and is symmetrical and bell-shaped, similar to the normal distribution, but with heavier tails.  The t-distribution provides a more accurate and conservative estimates of population parameters when dealing with small samples (n<30) or when population standard deviations are unknown (which is usually the case in social science research).

The shape of the t-distribution — how tall/short the center of the distribution is and how thin/thick the tails of the distribution are (i.e., the dispersion of the distribution) — is determined by the DEGREES OF FREEDOM (df).  The degrees of freedom for a single sample is equal to the sample size, minus one; as a formula: df=n-1.  As degrees of freedom increase, the t-distribution approaches the normal distribution.

To interpret a t-distribution, you will need to reference a T-DISTRIBUTION TABLE (i.e., a T-TABLE).  Using a t-table is similar to using a z-table:

  • Rows correspond to different degrees of freedom 
  • Columns correspond to different confidence levels (90%, 95%, 99%) or SIGNIFICANCE LEVELS (α), which are equal to 1 minus the confidence level (α = 0.10, 0.05, 0.01)
  • Table cells report the CRITICAL VALUES of the t-distribution, given the degrees of freedom and the confidence level/significance level; critical values are helpful in hypothesis testing and determining confidence intervals

Sample Statistics

The SAMPLE MEAN (X̄) is a measure of central tendency that represents the average value of a variable in sample data.  It is calculated in the same way that population mean (μ) is calculated: by summing all the observations for a variable in the sample, and dividing by the number of observations.

The SAMPLE STANDARD DEVIATION (s) is a measure of the dispersion or spread of the values in a sample around the sample mean.  It quantifies the amount of variation or dispersion of a set of values.  It is calculated in a similar manner to population standard deviation (σ), but with one notable difference: instead of dividing by the total number of observations (N), we divide by n-1. Dividing by n-1 produces a larger value (more variation) than dividing by N alone.  The reason why we would find this appealing when working with sample data is simple and straight-forward: whenever we use a sample instead of the entire population, there is the possibility of random error being introduced to our statistical analysis; calculating the standard deviation with n-1 errs on the side of caution by assuming larger variation.

The STANDARD ERROR OF THE MEAN (s.e.) is a measure of how much the sample mean (X̄) is expected to vary from the true population mean (μ) — in other words, it tells us how precise the sample mean is as an estimate of the population mean.  The standard error of the mean is calculated by dividing the sample standard deviation by the square root of the number of observations. The standard error of the mean decreases as the sample size increases.  Mathematically, the standard error of the mean is inversely related to the square root of the sample size (n). As the standard error of the mean decreases, the margin of error and confidence intervals narrow, and the sample mean becomes a more precise and reliable estimate of the population mean.  This is tied to the central limit theorem: 

Larger samples tend to provide a better representation of the population

→ The more representative a sample is, the more normally distributed the sample data is

→ As the sample mean approaches a normal distribution, we can make more accurate and robust inferences

Beware of Outliers!

Whether outliers (i.e., extreme values) are likely to skew results in a normal distribution is based in large part on sample size.  In small samples, outliers can disproportionately affect the sample mean, the sample standard deviation, and, as a result, the standard error of the mean.  This, in turn, can lead us to make generalizations about the population parameters based on inaccurate information.  Thus, it is important to identify, investigate, and decide how to handle outliers (i.e., include, exclude, adjust/transform, or consider separately) based on their potential impact and the context of the study. 

Random Samples and Post-Stratification Weighting

With random samples, there is no guarantee that samples will perfectly represent the population when it comes to various characteristics that may be relevant to the concept we are seeking to understand, explain, and/or predict, so POST-STRATIFICATION WEIGHTING is always needed.  With post-stratification weighting, sample data are WEIGHTED (adjusted) so the sample better mirrors the overall population.  This corrects for potential biases and makes the sample more representative of the population.  

Post-stratification weighting involves the following steps:

  1. Identify STRATA, or subgroups that share a specific characteristic, such as age, race, gender, income, education level, or other socio-economic and/or demographic factors
  2. Identify the proportions of the population falling into each STRATUM (singular of “strata”)
    • NOTE: To calculate population proportions, population-level data (such as census data) must be available for the characteristics you plan to use to weight your sample data
  3. Calculate the proportion of the sample that falls into each stratum (ex: the percentage of the sample who are male)
  4. Calculate a weight for each stratum — usually, the ratio of the population proportion for a strata to the sample proportion for a strata
    • A weight of “1” means that the proportion of the sample for that stratum matches the proportion of the population
    • A weight of greater than/less than “1” means the proportion of the sample for that stratum does not match the proportion of the population
  5. Apply the weights to sample data in each stratum, which adjusts the survey data to better reflect the distribution of these characteristics in the overall population
    • When the weight associated with a stratum = 1, no adjustment is necessary because this stratum is perfectly representative of the population
    • When the weight associated with a stratum > 1, that stratum is said to have been UNDERREPRESENTED (i.e., included in the sample in lower proportions than what is found in the population); this adjusts values for this stratum so they count more heavily in the overall analysis
    • When the weight associated with a stratum < 1, that stratum is said to have been OVERREPRESENTED (i.e., included in the sample in higher proportions than what is found in the population); this adjusts values for this stratum so they count less heavily in the overall analysis

Generalizability: A Function of Representativeness and Sample Size

Generalizability relies heavily on the representativeness of the sample (i.e., the extent to which a sample’s composition mirrors that of the population).  With probability sampling, members of a population have a known chance of selection, which minimizes selection bias and ensures the sample composition accurately reflects the characteristics of the population.  With non-probability sampling, however, SAMPLING BIAS and/or SELF-SELECTION BIAS may result in a non-representative sample: the sample composition does not accurately reflect the characteristics of the population.

Generalizability also relies heavily on sample size.  Larger sample sizes “typically do a better job at capturing the characteristics present in the population than do smaller samples” (Meier, Brudney, and Bohte, 2011, p. 178).  Larger sample sizes tend to provide more reliable sample statistics and more precise estimates of population parameters.  Furthermore, larger sample sizes increase the statistical power of a study, making it easier to identify statistically significant relationships and differences.

Populations vs. Samples

“A POPULATION is the total set of items that we are concerned about” (Meier, Brudney, and Bohte, 2011,  p. 173).  In other words, the population is the complete set of individuals or items that share a common characteristic or set of characteristics.  We are often interested in population PARAMETERS, i.e., numerical values that are fixed and describe a characteristic of a population, such as the population mean (μ), variance (σ²), and standard deviation (σ).  

“A SAMPLE is a subset of a population” (Meier, Brudney, and Bohte, 2011, p. 173).  There are two different types of samples: probability samples and non-probability samples. In a PROBABILITY SAMPLE, all members of the population have a KNOWN CHANCE of being selected as part of the sample.  To construct a probability sample, you will need to obtain a list of the entire population; this list then serves as the SAMPLING FRAME from which the sample will be selected/drawn. An example of a probability sample is a RANDOM SAMPLE, in which all members of the population have an equal chance of being selected in a sample. In a NON-PROBABILITY SAMPLE, some members of the population have NO CHANCE of being selected as part of the sample (in other words, the probability of selection cannot be determined). An example of a non-probability sample is a CONVENIENCE SAMPLE, in which the sample is selected based on convenience (i.e., as a result of being easy to contact or reach).

“A STATISTIC is a measure that is used to summarize a sample” (Meier, Brudney, and Bohte, 2011, p. 173), such as the measures of central tendency (ex: sample mean, X̄)and dispersion (ex: sample standard deviation, s) for a variable.  In order to treat sample findings as GENERALIZABLE to the population (i.e., use sample statistics as reliable estimates of the population parameters), the sample should to be a probability sample. 

Why Are Only Probability Samples Generalizable?

Probability samples are more representative of the population.  Furthermore, in probability sampling, the sampling distribution of the sample statistic (e.g., sample mean) can be determined based on statistical principles.  Thus, probability sampling allows us to calculate measures such as margins of error and confidence levels, which account for uncertainty in our sample statistics and capture how reliably they estimate population parameters. 

In contrast, non-probability samples lack a clear and defined sampling distribution, making it impossible to accurately estimate the variability of the sample statistic.

Inferential Statistics: The Basics

INFERENTIAL STATISTICS are “quantitative techniques [that can be used] to generalize from a sample to a population” (Meier, Brudney, and Bohte, 2011, p. 173).  When done correctly and with a large enough sample, the results obtained from a sample can be generalized to the population from which the sample was taken, with a known MARGIN OF ERROR that provides a range around a sample estimate within which the true population parameter is expected to lie (once this range is added to our point estimate, we call it a CONFIDENCE INTERVAL) and a CONFIDENCE LEVEL that indicates the probability that the population parameter falls within this interval.  Margin of error, confidence intervals, and confidence levels help quantify how precise our estimates are.

For example, every time you hear a news broadcast report the President Biden’s job approval rating, you are receiving inferences based on a sample of the population.  Naturally, it would be too costly and take too long to contact everyone in the United States to ask them how well Biden is doing as president.  Instead, a random sample of Americans is used to generate Biden’s job approval rating.  Then, depending on the sample size, the MOE is calculated; this accounts for variability in their estimates that results from not asking every American how well Biden is doing.  If CNN reports that Biden’s job approval is 44% with a ±3 MOE with a confidence level of 95%, we are 95% confident that the true job approval rating lies between 41% and 47%.   

Scale/Index Variables: A Measurement Technique Based on Z-Scores  

“A SCALE or INDEX is a composite measure combining several variables [or items] into a single unified measure of a concept” (Meier, Brudney, and Bohte, 2011, p. 144).  Scale/index variables are useful for several reasons:

  1. They can make analyzing a concept less complicated by reducing the number of variables
  2. They allow for more detailed analysis by
    • providing a more reliable and comprehensive measure of the underlying concept than any single item could provide on its own
    • transforming nominal level data into interval/ratio level data through summation (as the “scale” term indicates)
  3. They provide a clearer interpretation of the data by summarizing the information from multiple items into single scores, making it easier to communicate findings

If two or more variables are measured along the same scale (for instance, binary dummy variables), the values for these variables for each observation can simply be adding together to create a SUMMATIVE SCALE/INDEX variable.  If we have three such variables, the resulting summative scale/index variable will range from 0 to 3, use real numbers, have equal intervals between categories, and have an absolute zero point.  These characteristics describe a ratio-level variable.  You could also divide the range of the summative scale/index variable by the number of variables (for this example, 3/3) to transform it to a MEAN SCALE/INDEX variable that captures the average of across all three variables, using the original scale (for this example, ranging from 0 to 1). 

If the variables you want to use when constructing a scale/index variable are measured along different scales, you would first standardize your variables so they are measured using the same scale (specifically, both following a standard normal distribution).  Then, the standardized values (i.e., z-scores) for these variables for each observation can simply be adding together to create a summative scale/index variable with the characteristics of a ratio-level variable.