High-dimensional robust regression under heavy-tailed data: Asymptotics   and Universality

Urte Adomaityte; Leonardo Defilippis; Bruno Loureiro and; Gabriele Sicuro

arXiv:2309.16476·math.ST·June 3, 2024

High-dimensional robust regression under heavy-tailed data: Asymptotics and Universality

Urte Adomaityte, Leonardo Defilippis, Bruno Loureiro and, Gabriele Sicuro

PDF

Open Access

TL;DR

This paper analyzes the asymptotic behavior of robust high-dimensional regression estimators under heavy-tailed data, revealing limitations of common methods and proposing insights into optimal regularization and decay rates.

Contribution

It provides a sharp asymptotic characterization of M-estimators and ridge regression in heavy-tailed settings, highlighting the need for regularization and uncovering phase transitions.

Findings

01

Huber loss is suboptimal in high dimensions with heavy tails.

02

Ridge regression's decay rate varies with the existence of second moments.

03

Formulas extend to generalized linear models and mixture distributions.

Abstract

We investigate the high-dimensional properties of robust regression estimators in the presence of heavy-tailed contamination of both the covariates and response functions. In particular, we provide a sharp asymptotic characterisation of M-estimators trained on a family of elliptical covariate and noise data distributions including cases where second and higher moments do not exist. We show that, despite being consistent, the Huber loss with optimally tuned location parameter $δ$ is suboptimal in the high-dimensional regime in the presence of heavy-tailed noise, highlighting the necessity of further regularisation to achieve optimal performance. This result also uncovers the existence of a transition in $δ$ as a function of the sample complexity and contamination. Moreover, we derive the decay rates for the excess risk of ridge regression. We show that, while it is both optimal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Bayesian Methods and Mixture Models · Advanced Statistical Methods and Models

MethodsHuber loss