Gradient descent inference in empirical risk minimization
Qiyang Han, Xiaocong Xu

TL;DR
This paper develops a non-asymptotic, joint distributional analysis of gradient descent in high-dimensional empirical risk minimization, enabling statistical inference without requiring convergence.
Contribution
It introduces a novel state evolution theory for gradient descent that applies to non-convex losses and non-Gaussian data, facilitating debiased inference at each iteration.
Findings
Gradient descent iterates approximate normality after debiasing.
The proposed inference method is robust to model misspecification.
Provides estimates of generalization error during training.
Abstract
Gradient descent is one of the most widely used iterative algorithms in modern statistical learning. However, its precise algorithmic dynamics in high-dimensional settings remain only partially understood, which has limited its broader potential for statistical inference applications. This paper provides a precise, non-asymptotic joint distributional characterization of gradient descent iterates and their debiased statistics in a broad class of empirical risk minimization problems, in the so-called mean-field regime where the sample size is proportional to the signal dimension. Our non-asymptotic state evolution theory holds for both general non-convex loss functions and non-Gaussian data, and reveals the central role of two Onsager correction matrices that precisely characterize the non-trivial dependence among all gradient descent iterates in the mean-field regime. Leveraging the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReservoir Engineering and Simulation Methods
