Correlation vs. Causation

Correlation

CORRELATION refers to any relationship or statistical association between two variables.  If two variables are correlated, the variables appear to move together: as one variable changes, the other variable tends to change in a specific direction.  Two variables can display a POSITIVE CORRELATION (as the values for one variable increase, the values for the other variable increase) or a NEGATIVE CORRELATION (as the values for one variable increase, the values for the other variable decrease).  If two variables are UNCORRELATED, there is no apparent relationship between them.  

Positive and negative correlations can also be characterized based on the strength of the relationship between the two variables as either STRONG (a high degree of association between two variables), MODERATE (a noticeable but not perfect association between two variables) or WEAK ( a low degree of association between two variables).

Researchers can check to see if two variables are correlated by calculating their CORRELATION COEFFICIENT (also called PEARSON’S R), which measures the direction and strength of a linear relationship between two variables.  Pearson’s R is of the most widely used statistics in both descriptive statistics and inferential statistics.  Pearson’s R values range from -1 to 1:

  • -1 indicates a perfect negative linear relationship between two variables — i.e., as one variable increases by a unit of one, the other variable decreases by a unit of one
  • 0 indicates no linear relationship between two variables
  • 1 indicates a perfect positive linear relationship between to variables — i.e., as one variable increases by a unit of one, so does the other variable

There are four possible reasons for correlations: (1) variable X causes variable Y (CAUSATION); (2) variable Y causes variable X (REVERSE CAUSATION); (3) the relationship between variable X and variable Y is simply a coincidence (RANDOM CHANCE); and (4) some other variable Z causes both variable X and variable Y (SPURIOUS RELATIONSHIP). Thus, correlation DOES NOT equal causation.

Example: Ice Cream Sales and Sunburns

There is a strong positive correlation between ice cream sales and sunburns: as ice cream sales increase, so do sunburns.  Does this mean the ice cream is causing sunburns?  Of course not!  As this illustrates, correlation DOES NOT imply that one variable causes the other variable to change.  What other factor helps explain this observed correlation between ice cream sales and sunburns?  Weather!

  • As it gets warmer, people eat more ice cream
  • During the summer months, when its warmer, people are more likely to go outside — that, combined with being closer to the sun, results in increased opportunities for sunburns

This is an example of a spurious relationship — an apparent causal relationship between two variables that is actually due to one or more other variables.  

Causation

In the context of hypothesis testing, CAUSALITY (i.e., whether one variable affects/leads to changes in another variable) is usually what we are interested in because it helps us understand mechanisms and underlying processes, thereby allowing us to make accurate predictions.

Demonstrating Causation

To demonstrate causation, a few factors must be present:

  1. The variables must be correlated
  2. The cause must precede the effect
  3. Other possible causes/explanations of the variation observed in the dependent variable must be ruled out