**R-squared** is a measure of how well a linear regression model fits the data. It can be interpreted as the proportion of variance of the outcome Y explained by the linear regression model.

It is a number between 0 and 1 (0 ≤ R^{2} ≤ 1). The closer its value is to 1, the more variability the model explains. And R^{2} = 0 means that the model cannot explain any variability in the outcome Y.

On the other hand, the **correlation coefficient r** is a measure that quantifies the strength of the linear relationship between 2 variables.

r is a number between -1 and 1 (-1 ≤ r ≤ 1):

**A value of r close to -1**: means that there is negative correlation between the variables (when one increases the other decreases and vice versa)**A value of r close to 0**: indicates that the 2 variables are not correlated (no linear relationship exists between them)**A value of r close to 1**: indicates a positive linear relationship between the 2 variables (when one increases, the other does)

Here are 3 plots that show the relationship between 2 variables with different correlation coefficients:

- The left one was drawn with a coefficient r = 0.80
- The middle one with r = -0.09
- And the right one with r = -0.76:

Below we will discuss the relationship between r and R^{2} in the context of linear regression without diving too deep into the mathematical details.

We start with the special case of a simple linear regression and then discuss the more general case of a multiple linear regression.

## R-squared vs r in the case of a simple linear regression

We’ve seen that both r and R-squared measure the strength of the linear relationship between 2 variables, so how do they relate in the case of a simple linear regression?

When we’re dealing with a simple linear regression:

Y = β_{0} + β_{1}X + ε

**R-squared will be the square of the correlation between the independent variable X and the outcome Y**:

R^{2} = Cor(**X**, Y) ^{2}

## R-squared vs r in the case of multiple linear regression

In simple linear regression we had 1 independent variable X and 1 dependent variable Y, so calculating the the correlation between X and Y was no problem.

In multiple linear regression we have more than 1 independent variable X, therefore we cannot calculate r between more than 1 X and Y.

When dealing with multiple linear regression:

Y = β_{0} + β_{1}X_{1} + β_{2}X_{2} + β_{3}X_{3} + β_{4}X_{4} + … + ε

**R-squared will be the square of the correlation between the predicted/fitted values of the linear regression (Ŷ) and the outcome (Y)**:

R^{2} = Cor(**Ŷ**, Y) ^{2}

Note that in the special case of the simple linear regression:

Cor( X, Ŷ) = 1

So:

Cor( X, Y ) = Cor( Ŷ, Y )

Which is why, in that special case:

R^{2} = Cor( Ŷ, Y ) ^{2} = Cor( X, Y ) ^{2}