Balancing the Scales: A Theoretical and Algorithmic Framework for Learning from Imbalanced Data

Corinna Cortes; Anqi Mao; Mehryar Mohri; Yutao Zhong

arXiv:2502.10381·cs.LG·December 30, 2025

Balancing the Scales: A Theoretical and Algorithmic Framework for Learning from Imbalanced Data

Corinna Cortes, Anqi Mao, Mehryar Mohri, Yutao Zhong

PDF

Open Access

TL;DR

This paper develops a new theoretical framework and algorithms for learning from imbalanced data, providing strong guarantees and demonstrating improved empirical performance over existing methods.

Contribution

It introduces a novel class-imbalanced margin loss, proves its strong $H$-consistency, and proposes the IMMAX algorithm for imbalanced classification.

Findings

01

The new margin loss is strongly $H$-consistent.

02

IMMAX outperforms existing baselines in experiments.

03

Theoretical guarantees are derived based on class-sensitive Rademacher complexity.

Abstract

Class imbalance remains a major challenge in machine learning, especially in multi-class problems with long-tailed distributions. Existing methods, such as data resampling, cost-sensitive techniques, and logistic loss modifications, though popular and often effective, lack solid theoretical foundations. As an example, we demonstrate that cost-sensitive methods are not Bayes-consistent. This paper introduces a novel theoretical framework for analyzing generalization in imbalanced classification. We propose a new class-imbalanced margin loss function for both binary and multi-class settings, prove its strong $H$ -consistency, and derive corresponding learning guarantees based on empirical loss and a new notion of class-sensitive Rademacher complexity. Leveraging these theoretical results, we devise novel and general learning algorithms, IMMAX (Imbalanced Margin Maximization), which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques

MethodsFocus