Correlation Coefficient vs Regression Coefficient

Both the correlation and regression coefficients rely on the hypothesis that the data can be represented by a straight line. They are similar in many ways, but they serve different purposes.

Here’s a table that summarizes the similarities and differences between the correlation coefficient, r, and the regression coefficient, β:

Correlation coefficient: rRegression coefficient: β1
(in the linear model:
Y = β0 + β1X)
ObjectiveMeasures the strength of the linear relationship between 2 variables: X and Y.Describes the relationship between 2 variables: X and Y.
Range[-1, 1](-∞, +∞)
Interpretation• r close to -1 reflects a negative correlation between X and Y (as one increases, the other decreases).
r close to 0 reflects no correlation between X and Y (no linear relationship exists between the 2 variables).
• r close to 1 reflects a positive correlation between X and Y (the 2 variables tend to increase and decrease together).
β1 is the unit change in Y corresponding to a 1 unit change in X. (for more details, see: Interpret Linear Regression Coefficients)
Similar to the correlation coefficient r:
• β1 < 0 reflects a negative correlation between X and Y.
• β1 > 0 reflects a positive correlation between X and Y.
CalculationThe correlation coefficient r is the rescaled version of the regression coefficient β1.
Specifically:
\(r = \beta_1 × \frac{standardDeviation(X)}{standardDeviation(Y)}\)
β1 is calculated by minimizing the sum of squared residuals of the linear model: Y = β0 + β1X.
Specifically:
\(\beta_1 = \frac{covariance(X, Y)}{variance(X)}\)
StrengthDoes not depend on the units of measurement of X and Y.Quantifies the amount of change in Y for a 1 unit change in X.
β1 enables us to calculate the value of Y for different values of X.
LimitationCannot tell us what happens to Y if we change X by 1 unit.
Does not tell us anything about the value of Y given X.
Depends on the units of measurement of X and Y.

Note 1: Standardized regression coefficient

The standardized simple linear regression coefficient is equal to the correlation coefficient.

Explanation:

A standardized regression coefficient is obtained after running a regression model on standardized variables (i.e. rescaled variables that have a mean of 0 and a standard deviation of 1).

We can standardize the variable X, for example, by subtracting its mean from each value and dividing by its standard deviation:

\(standardizedX = \frac{X-mean(X)}{sd(X)}\)

In R we can use the function scale() to standardize X and Y:

standardizedX = scale(X)
standardizedY = scale(Y)

After standardizing the variables X and Y, we can calculate the regression coefficient of the model: Y = β0 + β1X:

lm(standardizedY ~ standardizedX)$coefficients

And compare it to the correlation coefficient:

cor(X, Y)

Here’s an example:

model = lm(scale(Sepal.Length) ~ scale(Sepal.Width), data = iris)
model$coefficients
# outputs:
#   (Intercept) scale(Sepal.Width) 
# -3.759491e-16      -1.175698e-01 

cor(Sepal.Length, Sepal.Width)
# outputs:
# -0.1175698

# which is equal to -1.175698e-01

Note 2: Linear regression with more than one predictor

In the case of a multivariate linear regression (Y = β0 + β1X + β2Z), the partial correlation coefficient (between X and Y, controlling for Z) is the rescaled version of the regression coefficient β1 in the equation.

Explanation:

For the multivariable model:

Y = β0 + β1X + β2Z

The coefficient β1 is the unit change in Y for a 1 unit change in X, conditional on Z, so it can no longer be related to the correlation between X and Y alone.

Let’s look at an example in R:

X = iris$Sepal.Width
Y = iris$Sepal.Length
Z = iris$Petal.Length

# run the linear model: Y = β0 + β1 X + β2 Z
lm(Y ~ X + Z)$coefficients

# (Intercept)           X           Z 
#   2.2491402   0.5955247   0.4719200 

In this model, the coefficient β1 = 0.5955247 reflects the relationship between X and Y adjusted for Z (i.e. conditional on Z).

Next, let’s remove the parts of X and Y that can be explained by Z to obtain residualsX and residualsY:

# remove the variation in X that can be explained by Z
residualsX = lm(X ~ Z)$residuals

# remove the variation of Y that can be explained by Z
residualsY = lm(Y ~ Z)$residuals

Now we can calculate the correlation coefficient between residualsX and residualsY, known as the partial correlation coefficient between X and Y, controlling for Z:

cor(residualsX, residualsY) # 0.5781005

From this partial correlation coefficient, we can obtain the regression coefficient, β1 = 0.5955247 that we found above, by using the formula:

\(r = \beta_1 × \frac{standardDeviation(residualsX)}{standardDeviation(residualsY)}\)

So:

\(\beta_1 = r × \frac{standardDeviation(residualsY)}{standardDeviation(residualsX)}\)

cor(residualsX, residualsY) * sd(residualsY) / sd(residualsX)
# 0.5955247

Conclusion: For a multivariate model, the relationship becomes between β1 and the partial correlation coefficient.

Further reading