Distribution-Free Robust Linear Regression

Jaouad Mourtada; Tomas Va\v{s}kevi\v{c}ius; Nikita Zhivotovskiy

arXiv:2102.12919·math.ST·February 25, 2022

Distribution-Free Robust Linear Regression

Jaouad Mourtada, Tomas Va\v{s}kevi\v{c}ius, Nikita Zhivotovskiy

PDF

TL;DR

This paper develops a distribution-free robust linear regression method that works under minimal assumptions, handling heavy-tailed data with optimal risk guarantees and sub-exponential tail bounds.

Contribution

It introduces a novel non-linear estimator combining truncated least squares, median-of-means, and aggregation, achieving optimal excess risk in heavy-tailed, distribution-free settings.

Findings

01

Optimal in-expectation bound for truncated least squares estimator.

02

Failure of classical procedures with constant probability for some distributions.

03

Proposed estimator attains excess risk of order d/n with sub-exponential tail.

Abstract

We study random design linear regression with no assumptions on the distribution of the covariates and with a heavy-tailed response variable. In this distribution-free regression setting, we show that boundedness of the conditional second moment of the response given the covariates is a necessary and sufficient condition for achieving nontrivial guarantees. As a starting point, we prove an optimal version of the classical in-expectation bound for the truncated least squares estimator due to Gy\"{o}rfi, Kohler, Krzy\.{z}ak, and Walk. However, we show that this procedure fails with constant probability for some distributions despite its optimal in-expectation performance. Then, combining the ideas of truncated least squares, median-of-means procedures, and aggregation theory, we construct a non-linear estimator achieving excess risk of order $d / n$ with an optimal sub-exponential tail.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Regression