Consistent regression when oblivious outliers overwhelm

Tommaso d'Orsi; Gleb Novikov; David Steurer

arXiv:2009.14774·cs.LG·May 26, 2021·1 cites

Consistent regression when oblivious outliers overwhelm

Tommaso d'Orsi, Gleb Novikov, David Steurer

PDF

Open Access 1 Video

TL;DR

This paper demonstrates that robust linear regression can be achieved with nearly linear sample size even when an adversary corrupts most observations, extending previous results to broader design matrices and heavy-tailed noise.

Contribution

It proves the consistency of the Huber loss estimator under minimal assumptions and introduces a simple median-based algorithm for Gaussian designs with optimal guarantees.

Findings

01

Huber loss estimator is consistent with nearly linear sample size

02

Optimal error bounds are achieved, matching lower bounds

03

A simple median-based algorithm works efficiently for Gaussian designs

Abstract

We consider a robust linear regression model $y = X β^{*} + η$ , where an adversary oblivious to the design $X \in R^{n \times d}$ may choose $η$ to corrupt all but an $α$ fraction of the observations $y$ in an arbitrary way. Prior to our work, even for Gaussian $X$ , no estimator for $β^{*}$ was known to be consistent in this model except for quadratic sample size $n ≳ (d / α)^{2}$ or for logarithmic inlier fraction $α \geq 1/ lo g n$ . We show that consistent estimation is possible with nearly linear sample size and inverse-polynomial inlier fraction. Concretely, we show that the Huber loss estimator is consistent for every sample size $n = ω (d / α^{2})$ and achieves an error rate of $O (d / α^{2} n)^{1/2}$ . Both bounds are optimal (up to constant factors). Our results extend to designs far beyond the Gaussian case and only require the column span of $X$ …

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Consistent regression when oblivious outliers overwhelm· slideslive

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Machine Learning and Algorithms · Distributed Sensor Networks and Detection Algorithms

MethodsHuber loss · Linear Regression