Here’s the equation of a logistic regression model with 1 predictor X:
Where P is the probability of having the outcome and P / (1-P) is the odds of the outcome.
The easiest way to interpret the intercept is when X = 0:
When X = 0, the intercept β0 is the log of the odds of having the outcome.
From log odds to probability
Because the concept of odds and log odds is difficult to understand, we can solve for P to find the relationship between the probability of having the outcome and the intercept β0.
To solve for the probability P, we exponentiate both sides of the equation above to get:
With this equation, we can calculate the probability P for any given value of X, but when X = 0 the interpretation becomes simpler:
When X = 0, the probability of having the outcome is P = eβ0 / (1 + eβ0).
Without even calculating this probability, if we only look at the sign of the coefficient, we can say that:
- If the intercept has a negative sign: then the probability of having the outcome will be < 0.5.
- If the intercept has a positive sign: then the probability of having the outcome will be > 0.5.
- If the intercept is equal to zero: then the probability of having the outcome will be exactly 0.5.
Let’s illustrate this with an example
Suppose we want to study the effect of Smoking
on the 10-year risk of Heart disease
. The table below shows the summary of a logistic regression that models the presence of heart disease using smoking as a predictor:
Coefficient | Standard Error | p-value | |
---|---|---|---|
Intercept | -1.93 | 0.13 | < 0.001 |
Smoking | 0.38 | 0.17 | 0.03 |
So our objective is to interpret the intercept β0 = -1.93.
Using the equation above and assuming a value of 0 for smoking:
P = eβ0 / (1 + eβ0) = e-1.93 / (1 + e-1.93) = 0.13
But what does it mean to set the variable smoking = 0?
1. If smoking is a continuous variable (annual tobacco consumption in Kilograms)
In this context, smoking = 0 means that we are talking about a group that has an annual usage of tobacco of 0 Kg, i.e. non-smokers.
So the interpretation becomes:
The probability that a non-smoker will have heart disease in the next 10 years is 0.13.
1.1. What if smoking was a standardized variable?
A standardized variable is a variable rescaled to have a mean of 0 and a standard deviation of 1. This is done by subtracting the mean and dividing by the standard deviation for each value of the variable. The goal is to force predictors to be on the same scale so that their effects on the outcome can be compared just by looking at their coefficients.
In this case, smoking = 0 corresponds to the mean annual consumption of tobacco in Kg, and the interpretation becomes:
For an average consumer of tobacco, the probability of having heart disease in the next 10 years is 0.13.
1.2. What if all subjects in our study were smokers?
Then setting the Smoking
variable equal to 0 does not make sense anymore. Since the non-smoking group is not represented in the data, we cannot expect our results to generalize to this specific group.
In this case, it makes sense to evaluate the intercept at a value of smoking different from 0. For instance, we can take the minimum, maximum or mean of the variable Smoking
as a reference point.
Let’s pick the maximum as a reference and calculate the limit of how much smoking can affect the risk of heart disease.
Suppose that in our sample the largest amount of tobacco smoked in a year was 3 Kg, then:
P = eβ0 + β1X / (1 + eβ0 + β1X) where X = 3 Kg
Replacing the numbers, we get P = 0.31.
The interpretation becomes:
The maximum annual tobacco consumption of 3 kg is associated with a 31% risk of having heart disease in the next 10 years.
2. If smoking is a binary variable (0: non-smoker, 1: smoker)
Then assuming a value of 0 for smoking, the equation above is still:
P = eβ0 ÷ (1 + eβ0) = e-1.93 ÷ (1 + e-1.93) = 0.13
And the interpretation also stays the same:
The probability that a non-smoker will have a heart disease in the next 10 years is 0.13.
Note: If smoking was on a scale from 1 to 10 (no zero)
Then we can interpret the intercept for one of these values using the equation above (as we did in section 1.2).