Robust VIF regression with application to variable selection in large   data sets

Debbie J. Dupuis; Maria-Pia Victoria-Feser

arXiv:1304.5349·stat.AP·April 22, 2013

Robust VIF regression with application to variable selection in large data sets

Debbie J. Dupuis, Maria-Pia Victoria-Feser

PDF

TL;DR

This paper introduces a robust VIF regression method designed for large datasets, improving variable selection accuracy by handling outliers effectively, especially in economic and demographic data predicting educational attainment.

Contribution

A novel robust VIF regression approach that maintains efficiency in outlier-free data and robustness in contaminated data, enhancing variable selection in large-scale regression problems.

Findings

01

Robust VIF outperforms classical VIF in presence of outliers.

02

Method maintains computational speed suitable for large datasets.

03

Improves prediction accuracy in economic and demographic studies.

Abstract

The sophisticated and automated means of data collection used by an increasing number of institutions and companies leads to extremely large data sets. Subset selection in regression is essential when a huge number of covariates can potentially explain a response variable of interest. The recent statistical literature has seen an emergence of new selection methods that provide some type of compromise between implementation (computational speed) and statistical optimality (e.g., prediction error minimization). Global methods such as Mallows' $C_{p}$ have been supplanted by sequential methods such as stepwise regression. More recently, streamwise regression, faster than the former, has emerged. A recently proposed streamwise regression approach based on the variance inflation factor (VIF) is promising, but its least-squares based implementation makes it susceptible to the outliers…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.