Both the correlation and regression coefficients rely on the hypothesis that the data can be represented by a straight line. They are similar in many ways, but they serve different purposes.

Here’s a table that summarizes the similarities and differences between the correlation coefficient, r, and the regression coefficient, β:

Correlation coefficient: r | Regression coefficient: β_{1}(in the linear model: Y = β _{0} + β_{1}X) | |
---|---|---|

Objective | Measures the strength of the linear relationship between 2 variables: X and Y. | Describes the relationship between 2 variables: X and Y. |

Range | [-1, 1] | (-∞, +∞) |

Interpretation | • r close to -1 reflects a negative correlation between X and Y (as one increases, the other decreases).• r close to 0 reflects no correlation between X and Y (no linear relationship exists between the 2 variables).• r close to 1 reflects a positive correlation between X and Y (the 2 variables tend to increase and decrease together). | β_{1} is the unit change in Y corresponding to a 1 unit change in X. (for more details, see: Interpret Linear Regression Coefficients)Similar to the correlation coefficient r: • β _{1} < 0 reflects a negative correlation between X and Y.• β _{1} > 0 reflects a positive correlation between X and Y. |

Calculation | The correlation coefficient r is the rescaled version of the regression coefficient β_{1}.Specifically: \(r = \beta_1 × \frac{standardDeviation(X)}{standardDeviation(Y)}\) | β_{1} is calculated by minimizing the sum of squared residuals of the linear model: Y = β_{0} + β_{1}X.Specifically: \(\beta_1 = \frac{covariance(X, Y)}{variance(X)}\) |

Strength | Does not depend on the units of measurement of X and Y. | Quantifies the amount of change in Y for a 1 unit change in X. β _{1} enables us to calculate the value of Y for different values of X. |

Limitation | Cannot tell us what happens to Y if we change X by 1 unit. Does not tell us anything about the value of Y given X. | Depends on the units of measurement of X and Y. |

## Note 1: Standardized regression coefficient

The standardized simple linear regression coefficient is equal to the correlation coefficient.

### Explanation:

A standardized regression coefficient is obtained after running a regression model on standardized variables (i.e. rescaled variables that have a mean of 0 and a standard deviation of 1).

We can standardize the variable X, for example, by subtracting its mean from each value and dividing by its standard deviation:

\(standardizedX = \frac{X-mean(X)}{sd(X)}\)

In R we can use the function *scale() *to standardize X and Y:

standardizedX = scale(X) standardizedY = scale(Y)

After standardizing the variables X and Y, we can calculate the regression coefficient of the model: Y = β_{0} + β_{1}X:

lm(standardizedY ~ standardizedX)$coefficients

And compare it to the correlation coefficient:

cor(X, Y)

Here’s an example:

model = lm(scale(Sepal.Length) ~ scale(Sepal.Width), data = iris) model$coefficients # outputs: # (Intercept) scale(Sepal.Width) # -3.759491e-16 -1.175698e-01 cor(Sepal.Length, Sepal.Width) # outputs: # -0.1175698 # which is equal to -1.175698e-01

## Note 2: Linear regression with more than one predictor

In the case of a multivariate linear regression (Y = β_{0}+ β_{1}X + β_{2}Z), the partial correlation coefficient (between X and Y, controlling for Z) is the rescaled version of the regression coefficient β_{1}in the equation.

### Explanation:

For the multivariable model:

Y = β_{0} + β_{1}X + β_{2}Z

The coefficient β_{1} is the unit change in Y for a 1 unit change in X, **conditional on Z**, so it can no longer be related to the correlation between X and Y alone.

Let’s look at an example in R:

X = iris$Sepal.Width Y = iris$Sepal.Length Z = iris$Petal.Length # run the linear model: Y = β0 + β1 X + β2 Z lm(Y ~ X + Z)$coefficients # (Intercept) X Z # 2.2491402 0.5955247 0.4719200

In this model, the coefficient β_{1} = 0.5955247 reflects the relationship between X and Y adjusted for Z (i.e. conditional on Z).

Next, let’s remove the parts of X and Y that can be explained by Z to obtain residualsX and residualsY:

# remove the variation in X that can be explained by Z residualsX = lm(X ~ Z)$residuals # remove the variation of Y that can be explained by Z residualsY = lm(Y ~ Z)$residuals

Now we can calculate the correlation coefficient between residualsX and residualsY, known as the **partial correlation coefficient** between X and Y, controlling for Z:

cor(residualsX, residualsY) # 0.5781005

From this partial correlation coefficient, we can obtain the regression coefficient, β_{1} = 0.5955247 that we found above, by using the formula:

\(r = \beta_1 × \frac{standardDeviation(residualsX)}{standardDeviation(residualsY)}\)

So:

\(\beta_1 = r × \frac{standardDeviation(residualsY)}{standardDeviation(residualsX)}\)

cor(residualsX, residualsY) * sd(residualsY) / sd(residualsX) # 0.5955247

Conclusion: For a multivariate model, the relationship becomes between β_{1} and the partial correlation coefficient.