This component calculates Variance Inflation Factor (VIF) across all numeric variables in the input data table. It can be used to remove collinear features in a regression.
Multicollinearity occurs when two or more columns are correlated among each other and provide redundant information when jointly considered as predictors of a model. VIF is used to diagnose the extent of multicollinearity within predictors of a model. For instance, a VIF of 3 tells us that the variance of a column is 3 times larger than it would be if that column was fully uncorrelated with all other predictors.
As a rule of thumb, columns with VIF higher than 5 should be removed as predictors of a model in order to reduce dimensionality while minimizing collinearity (James et al., 2014).
The interactive view of the component will show the VIF values and highlight the ones above threshold.
References:
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2014. An Introduction to Statistical Learning: With Applications in R. Springer Publishing Company, Incorporated.
This component is free to use and modify.
Author: Andrea De Mauro, aboutbigdata.net
- Type: TableTableInput table