Distribution-Free Robust Linear Regression
Jaouad Mourtada, Tomas Va\v{s}kevi\v{c}ius, Nikita Zhivotovskiy

TL;DR
This paper develops a distribution-free robust linear regression method that works under minimal assumptions, handling heavy-tailed data with optimal risk guarantees and sub-exponential tail bounds.
Contribution
It introduces a novel non-linear estimator combining truncated least squares, median-of-means, and aggregation, achieving optimal excess risk in heavy-tailed, distribution-free settings.
Findings
Optimal in-expectation bound for truncated least squares estimator.
Failure of classical procedures with constant probability for some distributions.
Proposed estimator attains excess risk of order d/n with sub-exponential tail.
Abstract
We study random design linear regression with no assumptions on the distribution of the covariates and with a heavy-tailed response variable. In this distribution-free regression setting, we show that boundedness of the conditional second moment of the response given the covariates is a necessary and sufficient condition for achieving nontrivial guarantees. As a starting point, we prove an optimal version of the classical in-expectation bound for the truncated least squares estimator due to Gy\"{o}rfi, Kohler, Krzy\.{z}ak, and Walk. However, we show that this procedure fails with constant probability for some distributions despite its optimal in-expectation performance. Then, combining the ideas of truncated least squares, median-of-means procedures, and aggregation theory, we construct a non-linear estimator achieving excess risk of order with an optimal sub-exponential tail.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Regression
