Classification Imbalance as Transfer Learning
Eric Xia, Jason M. Klusowski

TL;DR
This paper models classification imbalance as transfer learning with label shift, analyzing oversampling methods like SMOTE and bootstrapping, and provides guidance on choosing augmentation strategies based on transfer costs.
Contribution
It introduces a transfer learning framework for class imbalance, decomposes excess risk, and compares oversampling methods theoretically and empirically.
Findings
SMOTE's transfer cost dominates bootstrapping in high dimensions
Bootstrapping generally outperforms SMOTE in transfer cost
Guidance for selecting augmentation strategies based on transfer costs
Abstract
Classification imbalance arises when one class is much rarer than the other. We frame this setting as transfer learning under label (prior) shift between an imbalanced source distribution induced by the observed data and a balanced target distribution under which performance is evaluated. Within this framework, we study a family of oversampling procedures that augment the training data by generating synthetic samples from an estimated minority-class distribution to roughly balance the classes, among which the celebrated SMOTE algorithm is a canonical example. We show that the excess risk decomposes into the rate achievable under balanced training (as if the data had been drawn from the balanced target distribution) and an additional term, the cost of transfer, which quantifies the discrepancy between the estimated and true minority-class distributions. In particular, we show that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Domain Adaptation and Few-Shot Learning · Machine Learning and Algorithms
