Consistent regression when oblivious outliers overwhelm
Tommaso d'Orsi, Gleb Novikov, David Steurer

TL;DR
This paper demonstrates that robust linear regression can be achieved with nearly linear sample size even when an adversary corrupts most observations, extending previous results to broader design matrices and heavy-tailed noise.
Contribution
It proves the consistency of the Huber loss estimator under minimal assumptions and introduces a simple median-based algorithm for Gaussian designs with optimal guarantees.
Findings
Huber loss estimator is consistent with nearly linear sample size
Optimal error bounds are achieved, matching lower bounds
A simple median-based algorithm works efficiently for Gaussian designs
Abstract
We consider a robust linear regression model , where an adversary oblivious to the design may choose to corrupt all but an fraction of the observations in an arbitrary way. Prior to our work, even for Gaussian , no estimator for was known to be consistent in this model except for quadratic sample size or for logarithmic inlier fraction . We show that consistent estimation is possible with nearly linear sample size and inverse-polynomial inlier fraction. Concretely, we show that the Huber loss estimator is consistent for every sample size and achieves an error rate of . Both bounds are optimal (up to constant factors). Our results extend to designs far beyond the Gaussian case and only require the column span of …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Machine Learning and Algorithms · Distributed Sensor Networks and Detection Algorithms
MethodsHuber loss · Linear Regression
