Fast and Robust Least Squares Estimation in Corrupted Linear Models
Brian McWilliams, Gabriel Krummenacher, Mario Lucic, Joachim M., Buhmann

TL;DR
This paper introduces a robust subsampling algorithm for large-scale linear regression that effectively detects and limits the influence of corrupted data points, improving estimation accuracy in the presence of outliers.
Contribution
It develops a randomized influence-based subsampling method that enhances robustness and efficiency in large-scale corrupted linear regression models.
Findings
The proposed method outperforms existing schemes on simulated datasets.
It effectively detects and limits influence of corrupted observations.
The approach is validated on real datasets, showing improved accuracy.
Abstract
Subsampling methods have been recently proposed to speed up least squares estimation in large scale settings. However, these algorithms are typically not robust to outliers or corruptions in the observed covariates. The concept of influence that was developed for regression diagnostics can be used to detect such corrupted observations as shown in this paper. This property of influence -- for which we also develop a randomized approximation -- motivates our proposed subsampling algorithm for large scale corrupted linear regression which limits the influence of data points since highly influential points contribute most to the residual error. Under a general model of corrupted observations, we show theoretically and empirically on a variety of simulated and real datasets that our algorithm improves over the current state-of-the-art approximation schemes for ordinary least squares.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Machine Learning and Algorithms · Statistical Methods and Inference
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
