# Correlation Coefficient vs Regression Coefficient

Both the correlation and regression coefficients rely on the hypothesis that the data can be represented by a straight line. They are similar in many ways, but they serve different purposes.

Here’s a table that summarizes the similarities and differences between the correlation coefficient, r, and the regression coefficient, β:

## Note 1: Standardized regression coefficient

The standardized simple linear regression coefficient is equal to the correlation coefficient.

### Explanation:

A standardized regression coefficient is obtained after running a regression model on standardized variables (i.e. rescaled variables that have a mean of 0 and a standard deviation of 1).

We can standardize the variable X, for example, by subtracting its mean from each value and dividing by its standard deviation:

$$standardizedX = \frac{X-mean(X)}{sd(X)}$$

In R we can use the function scale() to standardize X and Y:

standardizedX = scale(X)
standardizedY = scale(Y)

After standardizing the variables X and Y, we can calculate the regression coefficient of the model: Y = β0 + β1X:

lm(standardizedY ~ standardizedX)$coefficients And compare it to the correlation coefficient: cor(X, Y) Here’s an example: model = lm(scale(Sepal.Length) ~ scale(Sepal.Width), data = iris) model$coefficients
# outputs:
#   (Intercept) scale(Sepal.Width)
# -3.759491e-16      -1.175698e-01

cor(Sepal.Length, Sepal.Width)
# outputs:
# -0.1175698

# which is equal to -1.175698e-01

## Note 2: Linear regression with more than one predictor

In the case of a multivariate linear regression (Y = β0 + β1X + β2Z), the partial correlation coefficient (between X and Y, controlling for Z) is the rescaled version of the regression coefficient β1 in the equation.

### Explanation:

For the multivariable model:

Y = β0 + β1X + β2Z

The coefficient β1 is the unit change in Y for a 1 unit change in X, conditional on Z, so it can no longer be related to the correlation between X and Y alone.

Let’s look at an example in R:

X = iris$Sepal.Width Y = iris$Sepal.Length
Z = iris$Petal.Length # run the linear model: Y = β0 + β1 X + β2 Z lm(Y ~ X + Z)$coefficients

# (Intercept)           X           Z
#   2.2491402   0.5955247   0.4719200 

In this model, the coefficient β1 = 0.5955247 reflects the relationship between X and Y adjusted for Z (i.e. conditional on Z).

Next, let’s remove the parts of X and Y that can be explained by Z to obtain residualsX and residualsY:

# remove the variation in X that can be explained by Z
residualsX = lm(X ~ Z)$residuals # remove the variation of Y that can be explained by Z residualsY = lm(Y ~ Z)$residuals

Now we can calculate the correlation coefficient between residualsX and residualsY, known as the partial correlation coefficient between X and Y, controlling for Z:

cor(residualsX, residualsY) # 0.5781005

From this partial correlation coefficient, we can obtain the regression coefficient, β1 = 0.5955247 that we found above, by using the formula:

$$r = \beta_1 × \frac{standardDeviation(residualsX)}{standardDeviation(residualsY)}$$

So:

$$\beta_1 = r × \frac{standardDeviation(residualsY)}{standardDeviation(residualsX)}$$

cor(residualsX, residualsY) * sd(residualsY) / sd(residualsX)
# 0.5955247

Conclusion: For a multivariate model, the relationship becomes between β1 and the partial correlation coefficient.