Robust VIF regression with application to variable selection in large data sets
Debbie J. Dupuis, Maria-Pia Victoria-Feser

TL;DR
This paper introduces a robust VIF regression method designed for large datasets, improving variable selection accuracy by handling outliers effectively, especially in economic and demographic data predicting educational attainment.
Contribution
A novel robust VIF regression approach that maintains efficiency in outlier-free data and robustness in contaminated data, enhancing variable selection in large-scale regression problems.
Findings
Robust VIF outperforms classical VIF in presence of outliers.
Method maintains computational speed suitable for large datasets.
Improves prediction accuracy in economic and demographic studies.
Abstract
The sophisticated and automated means of data collection used by an increasing number of institutions and companies leads to extremely large data sets. Subset selection in regression is essential when a huge number of covariates can potentially explain a response variable of interest. The recent statistical literature has seen an emergence of new selection methods that provide some type of compromise between implementation (computational speed) and statistical optimality (e.g., prediction error minimization). Global methods such as Mallows' have been supplanted by sequential methods such as stepwise regression. More recently, streamwise regression, faster than the former, has emerged. A recently proposed streamwise regression approach based on the variance inflation factor (VIF) is promising, but its least-squares based implementation makes it susceptible to the outliers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
