Statistical Inference in Classification of High-dimensional Gaussian Mixture
Hanwen Huang, Peng Zeng

TL;DR
This paper analyzes the asymptotic behavior of regularized classifiers, especially $L_1$-regularized logistic regression, in high-dimensional Gaussian mixture models, focusing on generalization error and variable selection.
Contribution
It introduces a replica method-based analysis for high-dimensional Gaussian mixture classification, providing insights into variable selection and estimator performance.
Findings
Analytical expressions for generalization error in high dimensions
Validation of theoretical results with finite-sample simulations
Impact of covariance structure on variable selection performance
Abstract
We consider the classification problem of a high-dimensional mixture of two Gaussians with general covariance matrices. Using the replica method from statistical physics, we investigate the asymptotic behavior of a general class of regularized convex classifiers in the high-dimensional limit, where both the sample size and the dimension approach infinity while their ratio remains fixed. Our focus is on the generalization error and variable selection properties of the estimators. Specifically, based on the distributional limit of the classifier, we construct a de-biased estimator to perform variable selection through an appropriate hypothesis testing procedure. Using -regularized logistic regression as an example, we conducted extensive computational experiments to confirm that our analytical findings are consistent with numerical simulations in finite-sized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical and Computational Modeling
MethodsLogistic Regression · Focus
