For a simple linear regression model:
Y = β0 + β1 X + ε
The linear regression coefficient β1 associated with a predictor X is the expected difference in the outcome Y when comparing 2 groups that differ by 1 unit in X.
Another common interpretation of β1 is:
β1 is the expected change in the outcome Y per unit change in X. Therefore, increasing the predictor X by 1 unit (or going from 1 level to the next) is associated with an increase in Y by β1 units.
The latter interpretation implies that manipulating X will lead to a change in Y, which is a causal interpretation of the relationship between X and Y and so must be avoided unless:
- Your data come from an experimental design.
- You have identified and controlled for bias and confounding effects.
Let’s try to interpret the linear regression coefficients for the following example:
Suppose we want to study the relationship between
Heart Rate, so we used the linear regression model:
Heart Rate = β0 + β1 Smoking + ε
The following table summarizes the results of that model:
Notice that the coefficient of smoking is statistically significant (p < 0.05), which implies that within levels of smoking we should expect different average heart rates.
But how to interpret the magnitude of this relationship?
1. If smoking is a binary variable (0: non-smoker, 1: smoker):
Then β1 = 2.94 will be the average difference in heart rate between smokers and non-smokers.
So we can say that according to our model:
The heart rate of a smoker is expected to be 2.94 beats per minute higher compared to a non-smoker.
Note that we did not say that becoming a smoker increases your heart rate by 2.94 beats per minute. This is because our data come from an observational study and our model does not adjust for confounding (If you are interested in this subject, see An Example of Identifying and Adjusting for Confounding).
Interpreting the standard error:
The standard error (SE) is a measure of uncertainty in our estimation of the linear regression coefficient. It is useful for calculating the p-value and the confidence interval for the corresponding coefficient.
From the table above, we have: SE = 1.32.
We can calculate the 95% confidence interval using the following formula:
95% Confidence Interval = β1 ± 2 × SE = 2.94 ± 2 × 1.32 = [ 0.30, 5.58 ]
Remember that the 95% confidence interval is the range of values that has a 95% chance of containing the true value of the parameter that we are trying to estimate.
So in our case, we can conclude that:
We are 95% confident that the average difference in heart rate between smokers and non-smokers is somewhere between 0.30 and 5.58.
Or informally, we can say:
Based on our data, we expect the average heart rate of smokers to be 0.30 to 5.58 higher than that of non-smokers.
Interpreting the intercept:
The intercept β0 should be interpreted assuming a value of 0 for all the predictors in the model.
And because Smoking = 0 refers to the group of non-smokers, the intercept β0 = 78.66 can be interpreted as follows:
The average heart rate for non-smokers is 78.66 beats per minute.
Alternatively, we can say that:
For a non-smoker, we predict a heart rate close to 79 beats per minute.
For more information on how to interpret the intercept in different situations, I wrote a separate article: Interpret the Linear Regression Intercept.
2. If smoking is a numerical variable (lifetime usage of tobacco in Kilograms)
Then the coefficient β1 = 2.94 can be interpreted as follows:
2.94 beats per minute is the average difference in heart rate between 2 groups of people that differ in 1 Kg of lifetime tobacco usage.
Or equally, we can say that:
When comparing 2 persons who differ in lifetime tobacco usage, the one who uses 1 Kg more is expected to have 3 more heart beats per minute.
Interpreting the coefficient of a standardized variable:
A standardized variable is a variable rescaled to have a mean of 0 and a standard deviation of 1. This is done by subtracting the mean and dividing by the standard deviation for each value of the variable.
In our example above, if smoking was a standardized variable, the intercept β0 = 78.66 can be interpreted as follows:
The expected heart rate for the average smoker in our study is 78.66 beats per minute.
However, the standardized coefficient of smoking β1 = 2.94 will not have an intuitive interpretation:
An average difference of 2.94 beats per minute in heart rate is expected when comparing 2 groups that differ by 1 standard deviation in lifetime tobacco usage!!
This is why the coefficient of a standardized variable is not meant to be interpreted on its own.
In fact, standardization is mainly used when you have more than 1 predictor in your model, each measured on a different scale, and your goal is to compare the effect of each on the outcome — after standardization, the predictor Xi that has the largest coefficient is the one that has the most important effect on the outcome Y.
Note that standardization will not produce comparable regression coefficients if the variables in the model have different standard deviations or follow different distributions (for more information, I recommend 2 of my articles: Standardized vs Unstandardized Regression Coefficients and How to Assess Variable Importance in Linear and Logistic Regression).
3. If smoking is an ordinal variable (0: non-smoker, 1: light smoker, 2: moderate smoker, 3: heavy smoker)
Categorizing a predictor variable will result in loss of information and so it is not generally recommended. However, in some cases it makes sense when the relationship between the predictor and the outcome is not linear and cannot be corrected with a simple and interpretable variable transformation.
If smoking was divided into several ordered categories, then β1 = 2.94 can be interpreted as follows:
An average of 2.94 more heart beats per minutes is to be expected when comparing people in one level of smoking with the next.
4. If smoking is a categorical variable with multiple levels (0: non-smoker, 1: cigarette smoker, 2: cigar smoker)
First notice that here, the numbers 0, 1, and 2 represent unordered categories of smoking and therefore do not represent intensity nor does it make sense to do some calculation with them (such as taking their mean).
In general, a categorical variable with “N” levels can be included in a regression model only after dividing it into “N-1” binary variables.
In this case, the 3 categories of smoking will be used to create 2 binary variables, each having a separate coefficient β:
- The first variable is Cigarette smoker coded as follows: “1” if the person is a cigarette smoker, and “0” otherwise (i.e. non-smoker or cigar smoker).
- The second variable is Cigar smoker coded as follows: “1” if the person is a cigar smoker, and “0” otherwise (i.e. non-smoker or cigarette smoker).
- And non-smokers will be the reference group, so it will not be coded as a separate variable (instead it is coded implicitly since if cigarette smoker and cigar smoker both equal 0, then the person is definitely a non-smoker).
The model becomes:
Heart Rate = β0 + β1 Cigarette Smoker + β2 Cigar Smoker + ε
- β1 will correspond to the difference in the average heart rate between cigarette smokers and non-smokers (the reference group).
- β2 will correspond to the difference in the average heart rate between cigar smokers and non-smokers (the reference group).
Since β1 and β2 only reflect the effect of cigarette smoking compared to non-smoking and cigar smoking compared to non-smoking, one important question remains unanswered:
What is the global effect of smoking on heart rate?
If you are doing your statistical analysis in R, use the drop1 function. This will test if dropping 1 variable will significantly affect the model, and it will do so for each variable in the model. The output will be a single coefficient and p-value for each predictor including categorical variables no matter how many levels they have.
Here’s the code for it:
model = lm(Y ~ X) drop1(model, .~., test = "F")
What if the p-value for the coefficient β1 is not statistically significant?
The p-value answers the following question:
If in reality the predictor (X) and the outcome (Y) were not related, then how likely would it be to get a coefficient (β1) this large just by chance?
Specifically, the p-value comes from testing if β1 is statistically different from 0.
In a lot of cases, as in our example above, it makes little sense to test if the relationship between the predictor and the outcome is “exactly” zero, as at least a very small positive or negative effect is more likely than an exact zero effect.
In such cases, the p-value should not be taken seriously as it will always be < 0.05 given that we have enough data to detect that small effect. (For more information, see 7 Tricks to Get Statistically Significant p-Values)
- Gelman A, Hill J, Vehtari A. Regression and Other Stories. Cambridge University Press; 2021.
- James G, Witten D, Hastie T, Tibshirani R. An Introduction to Statistical Learning with Applications in R.; 2021.
- Falissard B. Analysis of Questionnaire Data with R. 1st Edition. Chapman and Hall/CRC; 2011.