Balancing the Scales: A Theoretical and Algorithmic Framework for Learning from Imbalanced Data
Corinna Cortes, Anqi Mao, Mehryar Mohri, Yutao Zhong

TL;DR
This paper develops a new theoretical framework and algorithms for learning from imbalanced data, providing strong guarantees and demonstrating improved empirical performance over existing methods.
Contribution
It introduces a novel class-imbalanced margin loss, proves its strong $H$-consistency, and proposes the IMMAX algorithm for imbalanced classification.
Findings
The new margin loss is strongly $H$-consistent.
IMMAX outperforms existing baselines in experiments.
Theoretical guarantees are derived based on class-sensitive Rademacher complexity.
Abstract
Class imbalance remains a major challenge in machine learning, especially in multi-class problems with long-tailed distributions. Existing methods, such as data resampling, cost-sensitive techniques, and logistic loss modifications, though popular and often effective, lack solid theoretical foundations. As an example, we demonstrate that cost-sensitive methods are not Bayes-consistent. This paper introduces a novel theoretical framework for analyzing generalization in imbalanced classification. We propose a new class-imbalanced margin loss function for both binary and multi-class settings, prove its strong -consistency, and derive corresponding learning guarantees based on empirical loss and a new notion of class-sensitive Rademacher complexity. Leveraging these theoretical results, we devise novel and general learning algorithms, IMMAX (Imbalanced Margin Maximization), which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques
MethodsFocus
