Interpret the Linear Regression Intercept

For a linear regression model:

Y = β0 + β1 X

The linear regression intercept β0 is the predicted value of the outcome Y when the predictor X equals zero.

As an example, we will try to interpret the intercept β0 = 78.66 in the following linear regression model:

Heart Rate = 78.66 + 2.94 Smoking

This interpretation will depend on the type of the variable “smoking” and on the way it is coded, so let’s explore some of these options:

1. If smoking is a numerical variable (lifetime usage of tobacco in Kilograms)

Heart Rate = 78.66 + 2.94 Smoking

The intercept β0 = 78.66 will be the expected value of heart rate for a person who smokes 0 Kg of tobacco in their lifetime.

This means that:

The expected heart rate for a never-smoker is 78.66 beats per minute.

2. If smoking is a binary variable (0: non-smoker, 1: smoker):

Heart Rate = 78.66 + 2.94 Smoking

The intercept β0 = 78.66 will be the average value of heart rate for the group where smoking equals 0.

Therefore:

The average heart rate for a non-smoker is 78.66 beats per minute.

The same interpretation holds for ordinal predictors. For instance, if smoking was coded as:
0: non-smoker
1: light smoker
2: moderate smoker
3: heavy smoker

3. If smoking is a categorical variable with multiple levels (0: non-smoker, 1: cigarette smoker, 2: cigar smoker)

First notice that here, the numbers 0, 1, and 2 represent unordered categories of smoking and therefore do not represent intensity nor does it make sense to do some calculation with them (such as taking their mean).

In general, a categorical variable with “N” levels can be included in a regression model only after dividing it into “N-1” binary variables.

In this case, the 3 categories of smoking will be used to create 2 binary variables, each having a separate coefficient β:

  • The first variable is Cigarette smoker coded as follows: “1” if the person is a cigarette smoker, and “0” otherwise (i.e. non-smoker or cigar smoker).
  • The second variable is Cigar smoker coded as follows: “1” if the person is a cigar smoker, and “0” otherwise (i.e. non-smoker or cigarette smoker).
  • And non-smokers will be the reference group, so it will not be coded as a separate variable (instead it is coded implicitly since if cigarette smoker and cigar smoker both equal 0, then the person is definitely a non-smoker).

The model becomes:

Heart Rate = β0 + β1 Cigarette Smoker + β2 Cigar Smoker + ε

In this case, the intercept β0 will be the average outcome for the reference group:

β0 = 78.66 beats per minute is the average heart rate for non-smokers.

4. If smoking is a standardized variable

A standardized variable is a variable rescaled to have a mean of 0 and a standard deviation of 1. This is done by subtracting the mean and dividing by the standard deviation for each value of the variable.

Standardization is useful when we have more than 1 predictor in your model, each measured on a different scale, and the goal is to compare the effect of each on the outcome — after standardization, the predictor Xi that has the largest coefficient is the one that has the most influence on the outcome Y

For more information on this subject, I recommend 2 of my articles: Standardized vs Unstandardized Regression Coefficients and How to Assess Variable Importance in Linear and Logistic Regression

In our example above, if smoking was a standardized variable, the intercept β0 = 78.66 can be interpreted as follows:

The expected heart rate for the average smoker in our study is 78.66 beats per minute.

Problems with interpreting the intercept

Interpreting the intercept β0 when the predictor X cannot be set to 0

In the context of the example above:

Heart Rate = 78.66 + 2.94 Smoking

Suppose that the variable smoking is binary, but instead of being coded as:

  • 0: non-smoker
  • 1: smoker

Instead it is coded as:

  • 1: non-smoker
  • 2: smoker

In this case, it does not make sense to set smoking = 0, and therefore β0 will have not clear interpretation.

We can however calculate the average heart rate of a non-smoker by setting smoker = 1, the equation becomes:

Heart Rate = 78.66 + 2.94 = 81.6

So the average heart rate for a non-smoker is 81.6 beats per minute.

Note that, for this 1:2 coding, the coefficient β1, the confidence interval and the p-value associated with it will stay the same as for the 0:1 coding. The model’s R-squared and adjusted R-squared will also stay the same compared to the 0:1 coding.

Here are 2 other situations where we cannot or should not set the predictor X = 0:

  1. When X = 0 is not represented in our data: For example, if all participants in our study are adults, it will not make sense to set the variable age equal to zero (since the results of our regression model are not expected to generalize to newborns).
  2. When X cannot possibly equal zero: For example, for variables such as height, weight, etc.

Does it matter if the intercept is not statistically significant? And is it safe then to remove it from the model?

In our example above, the p-value associated with β0 answers the following question:

If in reality the average heart rate for a non-smoker is 0, then how likely would it be to get an intercept (β0) this large just by chance?

As you can see, we do not need a statistical test to tell us that the average heart rate for a non-smoker is not 0. It certainly is not!

So as long as our sample is large enough, the p-value associated with β0 will always be < 0.05.

Even if we did not have enough data to reject the hypothesis that β0 = 0 (so the p-value will be > 0.05), this does not mean that the true value for β0 is zero.

Therefore:

Do not automatically remove the intercept from the model just because it is not statistically significant. Because when you remove the intercept, you will force β0 to be zero, which is inappropriate in some cases.

Further reading