# Interpret Interactions in Linear Regression

For a linear regression model with interaction:

`Y = β0 + β1 X1 + β2 X2 + β3 X1X2`

The coefficient of the interaction term (β3) is the increase in effectiveness of X1 for a 1 unit change in X2, and vice-versa.

For example:

Suppose we used linear regression to study the effect of `physical exercise` and `protein intake` on the `amount of muscle` the body can build in 1 month.

Here’s the model’s output:

`Muscle Mass = 22.1 + 1 Exercise - 0.4 Protein + 1.4 Exercise×Protein`
⚠ Why we included an interaction term in this model?
Including an interaction term is reasonable since we expect that the effect of protein intake on muscle mass will not be the same whether or not the person exercises. For more information, see Why and When to Include Interactions in a Regression Model.

So how to interpret these results?

First notice that the p-value associated with the interaction term is < 0.05. So there is a statistically significant interaction between protein intake and physical exercise. In other words, there is evidence that a synergy effect exists between these 2 variables — their combination is more powerful than the sum of their effects.

Another thing to look for when evaluating a model with interaction is the increase in R-squared for the model with interaction compared to the model without interaction.

In this case, R-squared increased from 16% to 19%, which means that the model with interaction helped explaining 3% more of the outcome variability. This provides evidence that the model with interaction is superior.

Next, we will interpret the model’s coefficients. This interpretation will depend on the type of the variables `exercise` and `protein`, and how they were coded. So let’s explore some of these options:

## 1. Interaction between 2 categorical variables

Here’s the regression equation:

`Muscle Mass = 22.1 + 1 Exercise - 0.4 Protein + 1.4 Exercise×Protein`

With the variables coded as follows:

• Muscle Mass: Total body muscle mass in Kilograms.
• Exercise: A binary categorical variable:
• 0: Does not exercise.
• 1: Exercises more than twice per week.
• Protein: A binary categorical variable:
• 0: Does not use any protein supplements.
• 1: Uses protein supplements.

### Interpreting the intercept β0 = 22.1:

`Muscle Mass = 22.1 + 1 Exercise - 0.4 Protein + 1.4 Exercise×Protein`

β0 is the predicted muscle mass when the variables exercise and protein both equal zero:

`The average muscle mass for a person who does not exercise and does not take any protein supplements is 22.1 Kg.`

### Interpreting the coefficient of exercise β1 = 1:

`Muscle Mass = 22.1 + 1 Exercise - 0.4 Protein + 1.4 Exercise×Protein`

If we set protein = 0 (to make the “interaction” and the “protein” terms disappear), then β1 will be the difference in the average muscle mass between groups where exercise = 0 and exercise = 1.

So β1 does not reflect the effect of exercise in general, but only in a subpopulation where protein = 0.

The interpretation will be:

`In the subpopulation of people who do not take protein supplements, those who exercise are expected to have an additional 1 Kg of muscle mass compared to those who do not exercise.`

### Interpreting the coefficient of protein β2 = 0.4:

`Muscle Mass = 22.1 + 1 Exercise - 0.4 Protein + 1.4 Exercise×Protein`

In the group where exercise = 0, β2 is the difference in the average muscle mass between those who take proteins versus those who do not.

Notice, however, in the table above that this coefficient is not statistically significant (p-value = 0.34).

So we conclude that:

`Among people who do not exercise, there is no significant difference in muscle mass between those who take protein and those who don't.`
⚠ Note: The hierarchical principle
If you include an interaction between 2 variables X1 and X2 in a regression model, then the main effects of X1 and X2 should also be included even if they were not statistically significant.

### Interpreting the coefficient of the interaction exercise × protein β3 = 1.4:

`Muscle Mass = 22.1 + 1 Exercise - 0.4 Protein + 1.4 Exercise×Protein`

β3 is the difference in the effect of protein on muscle mass between those who exercise and those who do not. Or vice-versa, it is the difference in the effect of exercise on muscle mass between those who take protein supplements and those who don’t.

`The effectiveness of protein supplements on muscle mass increased by 1.4 Kg when used with exercising compared to their use without exercising.`

## 2. Interaction between a continuous and a categorical variable

Let’s rewrite the regression equation:

`Muscle Mass = 22.1 + 1 Exercise - 0.4 Protein + 1.4 Exercise×Protein`

With the variables coded as follows:

• Muscle Mass: Total body muscle mass in Kilograms.
• Exercise: A binary categorical variable:
• 0: Does not exercise.
• 1: Exercises more than twice per week.
• Protein: Daily intake in 100 grams.

### Interpreting the intercept β0 = 22.1:

`Muscle Mass = 22.1 + 1 Exercise - 0.4 Protein + 1.4 Exercise×Protein`

β0 is the predicted muscle mass when the variables exercise and protein both equal zero.

Since it is implausible that protein intake will be 0 grams per day, this coefficient cannot be directly interpreted.

Note that if the variable protein was centered (by substracting its mean value from each observation), then protein = 0 will represent the average of amount of protein in the sample. And so β0 = 22.1 Kg will be the predicted muscle mass for a person who does not exercise and who consumes an average amount of protein daily.

### Interpreting the coefficient of exercise β1 = 1:

`Muscle Mass = 22.1 + 1 Exercise - 0.4 Protein + 1.4 Exercise×Protein`

For the group where protein = 0, then β1 will be the difference in the average muscle mass between those who exercise (exercise = 1) and those who don’t (exercise = 0).

Again setting the protein intake equal to zero is unrealistic, so this coefficient is also not interpretable.

However, if the variable protein was centered, then we can say that: among average consumers of protein (where protein = 0), the expected difference between those who exercise and those who don’t is β1 = 1 Kg in muscle mass.

### Interpreting the coefficient of protein β2 = 0.4:

`Muscle Mass = 22.1 + 1 Exercise - 0.4 Protein + 1.4 Exercise×Protein`

β2 is the average difference in muscle mass across those who do not exercise (where exercise = 0) but differ by 1 unit (i.e. 100 g) in daily protein intake.

But because this coefficient is not statistically significant (p-value = 0.34), we can conclude that:

`Among the population of people who do not exercise, no significant difference in muscle mass is to be expected between those who differ in their protein intake.`

### Interpreting the coefficient of the interaction exercise × protein β3 = 1.4:

`Muscle Mass = 22.1 + 1 Exercise - 0.4 Protein + 1.4 Exercise×Protein`

β3 is the difference in the effectiveness of protein intake on muscle mass between those who exercise and those who do not. Or vice-versa, it is the difference in the effectiveness of exercise on muscle mass between those who differ in their protein intake by 1 unit (i.e. 100 g).

To better understand this coefficient, let’s do some calculations:

Recall that the regression equation is:

`Muscle Mass = 22.1 + 1 Exercise - 0.4 Protein + 1.4 Exercise×Protein`

For people who do not exercise (exercise = 0), this equation becomes:

`Muscle Mass = 22.1 + 1× 0 - 0.4 Protein + 1.4 × 0 × Protein` `⇒ Muscle Mass = 22.1 - 0.4 Protein (eq.1)`

For people who do exercise (exercise = 1), the equation becomes:

`Muscle Mass = 22.1 + 1 × 1 - 0.4 Protein + 1.4 × 1 × Protein` `⇒ Muscle Mass = 23.1 + 1 Protein (eq.2)`

From equations 1 and 2, we can easily see that a 1 unit (i.e. 100 g) increase in protein intake has a larger effect on muscle mass for those who exercise (coefficient = 1) compared to those who don’t (coefficient = -0.4).

So the coefficient of the interaction β3 = 1.4 can be interpreted as follows:

`The effect on muscle mass of a 100 g increase in protein intake is 1.4 Kg more for those who exercise versus those who don't.`

## 3. Interaction between 2 continuous variables

Let’s rewrite the regression equation one last time:

`Muscle Mass = 22.1 + 1 Exercise - 0.4 Protein + 1.4 Exercise×Protein`

With the variables coded as follows:

• Muscle Mass: Total body muscle mass in Kilograms.
• Exercise: Duration in hours per day.
• Protein: Daily intake in 100 grams.

### Interpreting the intercept β0 = 22.1:

`Muscle Mass = 22.1 + 1 Exercise - 0.4 Protein + 1.4 Exercise×Protein`

β0 is the predicted muscle mass when the variables exercise and protein both equal zero.

Since it is implausible that protein intake will be 0 grams per day, this coefficient cannot be directly interpreted.

If the variable protein was centered, then β0 = 22.1 Kg will be the predicted muscle mass for a person who does not exercise (exercise = 0 hours per day) and consumes an average amount of protein daily (where protein = 0, which now corresponds to the mean of this variable).

### Interpreting the coefficient of exercise β1 = 1:

`Muscle Mass = 22.1 + 1 Exercise - 0.4 Protein + 1.4 Exercise×Protein`

For the group where protein = 0, β1 will be the difference in the average muscle mass between those who differ by 1 hour of exercise daily.

Again, because setting the protein intake equal to zero is unrealistic, this coefficient is also not interpretable.

If the variable protein was centered, then for the average consumers of protein (where protein = 0), the expected difference in muscle mass between those who differ by 1 hour of exercise daily will be β1 = 1 Kg.

### Interpreting the coefficient of protein β2 = 0.4:

`Muscle Mass = 22.1 + 1 Exercise - 0.4 Protein + 1.4 Exercise×Protein`

For the group of people who do not exercise (where exercise = 0 hours per day), β2 will be the average difference in muscle mass between those who differ by 1 unit (100 g) in protein intake.

But because this coefficient is not statistically significant (p-value = 0.34), we can conclude that:

`No significant difference in muscle mass is to be expected between people who do not exercise but differ in their protein intake.`

### Interpreting the coefficient of the interaction exercise × protein β3 = 1.4:

`Muscle Mass = 22.1 + 1 Exercise - 0.4 Protein + 1.4 Exercise×Protein`

Since this coefficient is positive, the higher the intensity of the exercise, the stronger the effect of protein will be.

Specifically:

`Each additional 100 g of protein daily increase the effectiveness of exercise on muscle mass by 1.4 Kg.`

And vice-versa:

`Each additional hour of daily exercise increases the effectiveness of protein consumed on muscle mass by 1.4 Kg.`

## References

• Gelman A, Hill J, Vehtari A. Regression and Other Stories. Cambridge University Press; 2021.
• James G, Witten D, Hastie T, Tibshirani R. An Introduction to Statistical Learning with Applications in R.; 2021.