What is an Acceptable Value for VIF? (With References)

Most research papers consider a VIF (Variance Inflation Factor) > 10 as an indicator of multicollinearity, but some choose a more conservative threshold of 5 or even 2.5.

So what threshold should YOU choose?

When choosing a VIF threshold, you should take into account that multicollinearity is a lesser problem when dealing with a large sample size compared to a smaller one. [Source]

That being said, here’s a list of references for different VIF thresholds recommended to detect collinearity in a multivariable (linear or logistic) model:

VIF ThresholdReference TypeReference DateReference
VIF > 10 is problematicBook2012Vittinghoff E, Glidden DV, Shiboski SC, McCulloch CE. Regression Methods in Biostatistics: Linear, Logistic, Survival, and Repeated Measures Models. 2nd ed. 2012 edition. Springer; 2011.
VIF > 5 or VIF > 10 is problematicBook2017James G, Witten D, Hastie T, Tibshirani R. An Introduction to Statistical Learning: With Applications in R. 1st ed. 2013, Corr. 7th printing 2017 edition. Springer; 2013.
VIF > 5 is cause for concern and VIF > 10 indicates a serious collinearity problemBook2001Menard S. Applied Logistic Regression Analysis. 2nd edition. SAGE Publications, Inc; 2001.
VIF ≥ 2.5 indicates considerable collinearityResearch Paper2018Johnston R, Jones K, Manley D. Confounding and collinearity in regression analysis: a cautionary tale and an alternative procedure, illustrated by studies of British voting behaviour. Qual Quant. 2018;52(4):1957-1976. doi:10.1007/s11135-017-0584-6

How to interpret a given VIF value?

Consider the following linear regression model:

Y = β0 + β1 × X1 + β2 × X2 + β3 × X3 + ε

For each of the independent variables X1, X2 and X3 we can calculate the variance inflation factor (VIF) in order to determine if we have a multicollinearity problem.

Here’s the formula for calculating the VIF for X1:

VIF (variance inflating factor) formula for the first variable in the model

R2 in this formula is the coefficient of determination from the linear regression model which has:

  • X1 as dependent variable
  • X2 and X3 as independent variables

In other words, R2 comes from the following linear regression model:

X1 = β0 + β1 × X2 + β2 × X3 + ε

And because R2 is a number between 0 and 1:

  • When R2 is close to 1 (i.e. X2 and X3 are highly predictive of X1): the VIF will be very large
  • When R2 is close to 0 (i.e. X2 and X3 are not related to X1): the VIF will be close to 1

Therefore the range of VIF is between 1 and infinity.

Now, let’s discuss how to interpret the following cases where:

  1. VIF = 1
  2. VIF = 2.5
  3. VIF = +∞

Example 1: VIF = 1

A VIF of 1 for a given independent variable (say for X1 from the model above) indicates the total absence of collinearity between this variable and other predictors in the model (X2 and X3).

Example 2: VIF = 2.5

If for example the variable X3 in our model has a VIF of 2.5, this value can be interpreted in 2 ways:

  1. The variance of β3 (the regression coefficient of X3) is 2.5 times greater than it would have been if X3 had been entirely non-related to other variables in our model
  2. The variance of β3 is 150% greater than it would be if there were no collinearity effect at all between X3 and other variables in our model

Why 150%?

This percentage is calculated by subtracting 1 (the value of VIF if there were no collinearity) from the actual value of VIF:

2.5 – 1 = 1.5

In percent:

1.5 * 100 / 100 = 150%.

Example 3: VIF = Infinity

An infinite value of VIF for a given independent variable indicates that it can be perfectly predicted by other variables in the model.

Looking at the equation above, this happens when R2 approaches 1.

Further reading