Most research papers consider a VIF (Variance Inflation Factor) > 10 as an indicator of multicollinearity, but some choose a more conservative threshold of 5 or even 2.5.

So what threshold should YOU choose?

When choosing a VIF threshold, you should take into account that multicollinearity is a lesser problem when dealing with a large sample size compared to a smaller one. [Source]

That being said, here’s a list of references for different VIF thresholds recommended to detect collinearity in a multivariable (linear or logistic) model:

VIF Threshold | Reference Type | Reference Date | Reference |
---|---|---|---|

VIF > 10 is problematic | Book | 2005 | Vittinghoff E. Regression Methods in Biostatistics: Linear, Logistic, Survival, and Repeated Measures Models. Springer; 2005. |

VIF > 5 or VIF > 10 is problematic | Book | 2017 | James G, Witten D, Hastie T, Tibshirani R. An Introduction to Statistical Learning: With Applications in R. 1st ed. 2013, Corr. 7th printing 2017 edition. Springer; 2013. |

VIF > 5 is cause for concern and VIF > 10 indicates a serious collinearity problem | Book | 2001 | Menard S. Applied Logistic Regression Analysis. 2nd edition. SAGE Publications, Inc; 2001. |

VIF ≥ 2.5 indicates considerable collinearity | Research Paper | 2018 | Johnston R, Jones K, Manley D. Confounding and collinearity in regression analysis: a cautionary tale and an alternative procedure, illustrated by studies of British voting behaviour. Qual Quant. 2018;52(4):1957-1976. doi:10.1007/s11135-017-0584-6 |

## How to interpret a given VIF value?

Consider the following linear regression model:

Y = β_{0} + β_{1} × X_{1} + β_{2} × X_{2} + β_{3} × X_{3} + ε

For each of the independent variables X_{1}, X_{2} and X_{3} we can calculate the variance inflation factor (VIF) in order to determine if we have a multicollinearity problem.

Here’s the formula for calculating the VIF for X_{1}:

R^{2} in this formula is the coefficient of determination from the linear regression model which has:

- X
_{1}as dependent variable - X
_{2}and X_{3 }as independent variables

In other words, R^{2} comes from the following linear regression model:

X_{1} = β_{0} + β_{1} × X_{2} + β_{2} × X_{3} + ε

And because R^{2} is a number between 0 and 1:

- When R
^{2}is close to 1 (i.e. X_{2}and X_{3}are highly predictive of X_{1}): the VIF will be very large - When R
^{2}is close to 0 (i.e. X_{2}and X_{3}are not related to X_{1}): the VIF will be close to 1

Therefore the range of VIF is between 1 and infinity.

Now, let’s discuss how to interpret the following cases where:

- VIF = 1
- VIF = 2.5
- VIF = +∞

### Example 1: VIF = 1

A VIF of 1 for a given independent variable (say for X_{1} from the model above) indicates the total absence of collinearity between this variable and other predictors in the model (X_{2} and X_{3}).

### Example 2: VIF = 2.5

If for example the variable X_{3} in our model has a VIF of 2.5, this value can be interpreted in 2 ways:

- The variance of β
_{3}(the regression coefficient of X_{3}) is 2.5 times greater than it would have been if X_{3}had been entirely non-related to other variables in our model - The variance of β
_{3}is 150% greater than it would be if there were no collinearity effect at all between X_{3}and other variables in our model

Why 150%?

This percentage is calculated by subtracting 1 (the value of VIF if there were no collinearity) from the actual value of VIF:

2.5 – 1 = 1.5

In percent:

1.5 * 100 / 100 = 150%.

### Example 3: VIF = *Infinity*

An infinite value of VIF for a given independent variable indicates that it can be perfectly predicted by other variables in the model.

Looking at the equation above, this happens when R^{2} approaches 1.