Correlation Coefficient vs Regression Coefficient

Both the correlation and regression coefficients rely on the hypothesis that the data can be represented by a straight line. They are similar in many ways, but they serve different purposes.

Here’s a table that summarizes the similarities and differences between the correlation coefficient, r, and the regression coefficient, β:

	Correlation coefficient: r	Regression coefficient: β₁ (in the linear model: Y = β₀ + β₁X)
Objective	Measures the strength of the linear relationship between 2 variables: X and Y.	Describes the relationship between 2 variables: X and Y.
Range	[-1, 1]	(-∞, +∞)
Interpretation	• r close to -1 reflects a negative correlation between X and Y (as one increases, the other decreases). • r close to 0 reflects no correlation between X and Y (no linear relationship exists between the 2 variables). • r close to 1 reflects a positive correlation between X and Y (the 2 variables tend to increase and decrease together).	β₁ is the unit change in Y corresponding to a 1 unit change in X. (for more details, see: Interpret Linear Regression Coefficients) Similar to the correlation coefficient r: • β₁ < 0 reflects a negative correlation between X and Y. • β₁ > 0 reflects a positive correlation between X and Y.
Calculation	The correlation coefficient r is the rescaled version of the regression coefficient β₁. Specifically: \(r = \beta_1 × \frac{standardDeviation(X)}{standardDeviation(Y)}\)	β₁ is calculated by minimizing the sum of squared residuals of the linear model: Y = β₀ + β₁X. Specifically: \(\beta_1 = \frac{covariance(X, Y)}{variance(X)}\)
Strength	Does not depend on the units of measurement of X and Y.	Quantifies the amount of change in Y for a 1 unit change in X. β₁ enables us to calculate the value of Y for different values of X.
Limitation	Cannot tell us what happens to Y if we change X by 1 unit. Does not tell us anything about the value of Y given X.	Depends on the units of measurement of X and Y.

Note 1: Standardized regression coefficient

The standardized simple linear regression coefficient is equal to the correlation coefficient.

Explanation:

A standardized regression coefficient is obtained after running a regression model on standardized variables (i.e. rescaled variables that have a mean of 0 and a standard deviation of 1).

We can standardize the variable X, for example, by subtracting its mean from each value and dividing by its standard deviation:

\(standardizedX = \frac{X-mean(X)}{sd(X)}\)

In R we can use the function scale() to standardize X and Y:

standardizedX = scale(X)
standardizedY = scale(Y)

After standardizing the variables X and Y, we can calculate the regression coefficient of the model: Y = β₀ + β₁X:

lm(standardizedY ~ standardizedX)$coefficients

And compare it to the correlation coefficient:

cor(X, Y)

Here’s an example:

model = lm(scale(Sepal.Length) ~ scale(Sepal.Width), data = iris)
model$coefficients
# outputs:
#   (Intercept) scale(Sepal.Width) 
# -3.759491e-16      -1.175698e-01 

cor(Sepal.Length, Sepal.Width)
# outputs:
# -0.1175698

# which is equal to -1.175698e-01

Note 2: Linear regression with more than one predictor

In the case of a multivariate linear regression (Y = β₀ + β₁X + β₂Z), the partial correlation coefficient (between X and Y, controlling for Z) is the rescaled version of the regression coefficient β₁ in the equation.

Explanation:

For the multivariable model:

Y = β₀ + β₁X + β₂Z

The coefficient β₁ is the unit change in Y for a 1 unit change in X, conditional on Z, so it can no longer be related to the correlation between X and Y alone.

Let’s look at an example in R:

X = iris$Sepal.Width
Y = iris$Sepal.Length
Z = iris$Petal.Length

# run the linear model: Y = β0 + β1 X + β2 Z
lm(Y ~ X + Z)$coefficients

# (Intercept)           X           Z 
#   2.2491402   0.5955247   0.4719200

In this model, the coefficient β₁ = 0.5955247 reflects the relationship between X and Y adjusted for Z (i.e. conditional on Z).

Next, let’s remove the parts of X and Y that can be explained by Z to obtain residualsX and residualsY:

# remove the variation in X that can be explained by Z
residualsX = lm(X ~ Z)$residuals

# remove the variation of Y that can be explained by Z
residualsY = lm(Y ~ Z)$residuals

Now we can calculate the correlation coefficient between residualsX and residualsY, known as the partial correlation coefficient between X and Y, controlling for Z:

cor(residualsX, residualsY) # 0.5781005

From this partial correlation coefficient, we can obtain the regression coefficient, β₁ = 0.5955247 that we found above, by using the formula:

\(r = \beta_1 × \frac{standardDeviation(residualsX)}{standardDeviation(residualsY)}\)

So:

\(\beta_1 = r × \frac{standardDeviation(residualsY)}{standardDeviation(residualsX)}\)

cor(residualsX, residualsY) * sd(residualsY) / sd(residualsX)
# 0.5955247

Conclusion: For a multivariate model, the relationship becomes between β₁ and the partial correlation coefficient.

Note 1: Standardized regression coefficient

Explanation:

Note 2: Linear regression with more than one predictor

Explanation:

Further reading