High Dimensional Binary Classification under Label Shift: Phase Transition and Regularization
Jiahui Cheng, Minshuo Chen, Hao Liu, Tuo Zhao, Wenjing Liao

TL;DR
This paper analyzes high-dimensional binary classification under label shift, revealing a phase transition where overparametrized classifiers trained on imbalanced data can outperform balanced ones, and how regularization affects this phenomenon.
Contribution
It provides the first asymptotic analysis of Fisher Linear Discriminant under label shift in overparametrized regimes, identifying phase transition phenomena and effects of regularization.
Findings
Existence of phase transition where imbalanced data can outperform balanced data in overparametrized regimes.
Regularization can eliminate the phase transition effect.
Overparametrization impacts the generalization performance under label shift.
Abstract
Label Shift has been widely believed to be harmful to the generalization performance of machine learning models. Researchers have proposed many approaches to mitigate the impact of the label shift, e.g., balancing the training data. However, these methods often consider the underparametrized regime, where the sample size is much larger than the data dimension. The research under the overparametrized regime is very limited. To bridge this gap, we propose a new asymptotic analysis of the Fisher Linear Discriminant classifier for binary classification with label shift. Specifically, we prove that there exists a phase transition phenomenon: Under certain overparametrized regime, the classifier trained using imbalanced data outperforms the counterpart with reduced balanced data. Moreover, we investigate the impact of regularization to the label shift: The aforementioned phase transition…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Machine Learning and Data Classification · Metaheuristic Optimization Algorithms Research
