## What is correlation?

Correlation is a statistical measure that measures the strength of the relationship between 2 variables. For example we can study the relationship between the GDP per capita in a country and the happiness level of people in that country. We can plot these 2 variables on a graph where the x-axis would represent the independent (explanatory) variable and the y-axis would represent the dependant (response) variable. In the above example we are trying to determine if peoples happiness levels are dependant on the GDP per capita. So the GDP per capita would be the independent variable (x-axis) and the happiness level would be the dependant variable (y-axis).

Let's look at a scatter plot of the above example that illustrates the relationship between the 2 variables.

With a scatter plot we can determine the linear relations between 2 variables with the correlation coefficient. The Correlation coefficient is a number between -1 and 1 that represents the magnitude and the direction of the relationship.

## Correlations

Let's take a look at the graph would look with a strong to weak correlation.

### Very Strong Relationship

### Strong Relationship

### Moderate Relationship

### Weak Relationship

### No Relationship

From the above graphs we can see that with a strong positive correlation, the value of y increased as the value of x increases. Similarly with a strong negative correlation the value of y decreases as the value of x increases. You can see the difference between a positive and negative correlation on the graphs below.

An important point to remember is that correlation only takes linear relationships into account. Let's take a look at a scatter plot of a non-linear relationship.

## Correlation Doesn't Imply Causation

Now it's important to note that if two variables are correlated, it does not imply causation. Or if x is correlated with y, it doesn't imply that x causes y.

For example in a study it showed that people who drank the most coffee were more likely to have lung cancer. So does it mean that because drinking coffee is correlated to having lung cancer, that lung cancer is caused by drinking coffee? Of course not, because there can be other factors. For example people who drink coffee are also more likely to be smokers, and we know that smoking causing lung cancer. We call this the confounder.

So we can say that smoking (confounder) causes lung cancer (y) and lung cancer is associated with drinking coffee (x) and drinking coffee is associated with smoking.