Classification Imbalance as Transfer Learning

Eric Xia; Jason M. Klusowski

arXiv:2601.10630·stat.ML·January 16, 2026

Classification Imbalance as Transfer Learning

Eric Xia, Jason M. Klusowski

PDF

Open Access

TL;DR

This paper models classification imbalance as transfer learning with label shift, analyzing oversampling methods like SMOTE and bootstrapping, and provides guidance on choosing augmentation strategies based on transfer costs.

Contribution

It introduces a transfer learning framework for class imbalance, decomposes excess risk, and compares oversampling methods theoretically and empirically.

Findings

01

SMOTE's transfer cost dominates bootstrapping in high dimensions

02

Bootstrapping generally outperforms SMOTE in transfer cost

03

Guidance for selecting augmentation strategies based on transfer costs

Abstract

Classification imbalance arises when one class is much rarer than the other. We frame this setting as transfer learning under label (prior) shift between an imbalanced source distribution induced by the observed data and a balanced target distribution under which performance is evaluated. Within this framework, we study a family of oversampling procedures that augment the training data by generating synthetic samples from an estimated minority-class distribution to roughly balance the classes, among which the celebrated SMOTE algorithm is a canonical example. We show that the excess risk decomposes into the rate achievable under balanced training (as if the data had been drawn from the balanced target distribution) and an additional term, the cost of transfer, which quantifies the discrepancy between the estimated and true minority-class distributions. In particular, we show that the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques · Domain Adaptation and Few-Shot Learning · Machine Learning and Algorithms