Why Loss Re-weighting Works If You Stop Early: Training Dynamics of Unconstrained Features
Yize Zhao, Christos Thrampoulidis

TL;DR
This paper investigates how loss reweighting in deep learning improves early training dynamics by balancing the learning of majority and minority features, despite not affecting the final model performance.
Contribution
It introduces a small-scale model to analyze the effects of loss reweighting on training dynamics, revealing its benefits in early learning stages.
Findings
Reweighting accelerates minority class feature learning early in training.
Vanilla ERM favors majority class features initially, delaying minority learning.
Reweighting restores balanced feature learning between classes.
Abstract
The application of loss reweighting in modern deep learning presents a nuanced picture. While it fails to alter the terminal learning phase in overparameterized deep neural networks (DNNs) trained on high-dimensional datasets, empirical evidence consistently shows it offers significant benefits early in training. To transparently demonstrate and analyze this phenomenon, we introduce a small-scale model (SSM). This model is specifically designed to abstract the inherent complexities of both the DNN architecture and the input data, while maintaining key information about the structure of imbalance within its spectral components. On the one hand, the SSM reveals how vanilla empirical risk minimization preferentially learns to distinguish majority classes over minorities early in training, consequently delaying minority learning. In stark contrast, reweighting restores balanced learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning
