Estimating Generalization Performance Along the Trajectory of Proximal SGD in Robust Regression
Kai Tan, Pierre C. Bellec

TL;DR
This paper develops estimators to accurately track the generalization error of gradient-based algorithms in high-dimensional robust regression, enabling optimal stopping and improved understanding of model performance.
Contribution
It introduces consistent estimators for the generalization error along the trajectory of GD, SGD, and proximal variants in high-dimensional robust regression with heavy-tailed errors.
Findings
Estimators accurately predict generalization error in various robust regression models.
Proposed risk estimates effectively serve as proxies for actual generalization error.
Simulations confirm the estimators' effectiveness in practical scenarios.
Abstract
This paper studies the generalization performance of iterates obtained by Gradient Descent (GD), Stochastic Gradient Descent (SGD) and their proximal variants in high-dimensional robust regression problems. The number of features is comparable to the sample size and errors may be heavy-tailed. We introduce estimators that precisely track the generalization error of the iterates along the trajectory of the iterative algorithm. These estimators are provably consistent under suitable conditions. The results are illustrated through several examples, including Huber regression, pseudo-Huber regression, and their penalized variants with non-smooth regularizer. We provide explicit generalization error estimates for iterates generated from GD and SGD, or from proximal SGD in the presence of a non-smooth regularizer. The proposed risk estimates serve as effective proxies for the actual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsFace and Expression Recognition · Advanced Statistical Methods and Models · Grey System Theory Applications
MethodsStochastic Gradient Descent
