High-dimensional robust regression under heavy-tailed data: Asymptotics and Universality
Urte Adomaityte, Leonardo Defilippis, Bruno Loureiro and, Gabriele Sicuro

TL;DR
This paper analyzes the asymptotic behavior of robust high-dimensional regression estimators under heavy-tailed data, revealing limitations of common methods and proposing insights into optimal regularization and decay rates.
Contribution
It provides a sharp asymptotic characterization of M-estimators and ridge regression in heavy-tailed settings, highlighting the need for regularization and uncovering phase transitions.
Findings
Huber loss is suboptimal in high dimensions with heavy tails.
Ridge regression's decay rate varies with the existence of second moments.
Formulas extend to generalized linear models and mixture distributions.
Abstract
We investigate the high-dimensional properties of robust regression estimators in the presence of heavy-tailed contamination of both the covariates and response functions. In particular, we provide a sharp asymptotic characterisation of M-estimators trained on a family of elliptical covariate and noise data distributions including cases where second and higher moments do not exist. We show that, despite being consistent, the Huber loss with optimally tuned location parameter is suboptimal in the high-dimensional regime in the presence of heavy-tailed noise, highlighting the necessity of further regularisation to achieve optimal performance. This result also uncovers the existence of a transition in as a function of the sample complexity and contamination. Moreover, we derive the decay rates for the excess risk of ridge regression. We show that, while it is both optimal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Bayesian Methods and Mixture Models · Advanced Statistical Methods and Models
MethodsHuber loss
