How is VIF calculated for dummy variables?

by carolina   Last Updated May 29, 2020 20:19 PM

I have a logistic regression model with 11 explanatory variables, 5 of which are dummy variables, when I use vif() function from library car in R, it gives me a VIF value for each of them. As far as I understand the vif of a variable is 1/(1-R^2), where R^2 is obtained from the regression on that explanatory variable as response. But when this variable is dichotomic, it´d be running a logistic regression, so R^2 is no use. Any ideas on how is the VIF calculated then? or is it not valid?

Answers 1

You have to be careful with VIFs, as they are not always calculated in the way that you understand.

This answer shows that the VIF calculation in this implementation in the car package is based on the coefficient covariance matrix. That might give the same result as your formula (which is the one most easily found on line) with an ordinary least squares regression, but not necessarily in a generalized linear model like your logistic regression.

There is an additional issue when there are categorical predictors having more than 2 levels; then a generalized VIF is reported by the car implementation that includes all levels of such a predictor together. See this answer for example.

Finally, for a binary predictor in an ordinary least squares regression, the standard VIF formula that you cite could still be used; a binary predictor temporarily considered as an outcome variable for the purpose of calculating its VIF would not be evaluated with logistic regression despite its binary nature.

April 01, 2020 21:50 PM

Related Questions