Universality of High-Dimensional Logistic Regression and a Novel CGMT under Dependence with Applications to Data Augmentation
Matthew Esmaili Mallory, Kevin Han Huang, Morgane Austern

TL;DR
This paper extends fundamental high-dimensional statistical results to dependent data settings, demonstrating the universality of logistic regression risk and introducing a new CGMT framework, with applications to data augmentation in deep learning.
Contribution
It generalizes Gaussian universality and CGMT to dependent data, enabling analysis of high-dimensional models with correlated observations and covariates.
Findings
Universality holds under block dependence, m-dependence, and mixing.
A novel CGMT framework for correlated data is established.
Data augmentation impacts asymptotic risk in deep learning.
Abstract
Over the last decade, a wave of research has characterized the exact asymptotic risk of many high-dimensional models in the proportional regime. Two foundational results have driven this progress: Gaussian universality, which shows that the asymptotic risk of estimators trained on non-Gaussian and Gaussian data is equivalent, and the convex Gaussian min-max theorem (CGMT), which characterizes the risk under Gaussian settings. However, these results rely on the assumption that the data consists of independent random vectors--an assumption that significantly limits its applicability to many practical setups. In this paper, we address this limitation by generalizing both results to the dependent setting. More precisely, we prove that Gaussian universality still holds for high-dimensional logistic regression under block dependence, -dependence and special cases of mixing, and establish a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLogistic Regression
