A Theoretical and Empirical Taxonomy of Imbalance in Binary Classification

Rose Yvette Bandolo Essomba; Ernest Fokou\'e

arXiv:2601.04149·stat.ML·January 8, 2026

A Theoretical and Empirical Taxonomy of Imbalance in Binary Classification

Rose Yvette Bandolo Essomba, Ernest Fokou\'e

PDF

Open Access

TL;DR

This paper develops a unified theoretical framework to analyze how class imbalance affects binary classification, validated through empirical experiments on genomic data showing predictable performance degradation regimes.

Contribution

It introduces a novel triplet of parameters $( ext{imbalance coefficient}, ext{sample--dimension ratio}, ext{separability})$ to explain imbalance effects across models.

Findings

01

Degradation regimes are predicted by the triplet $( ext{eta}, ext{kappa}, ext{Delta})$.

02

Empirical results match theoretical predictions on genomic data.

03

Performance metrics decline predictably as imbalance increases.

Abstract

Class imbalance significantly degrades classification performance, yet its effects are rarely analyzed from a unified theoretical perspective. We propose a principled framework based on three fundamental scales: the imbalance coefficient $η$ , the sample--dimension ratio $κ$ , and the intrinsic separability $Δ$ . Starting from the Gaussian Bayes classifier, we derive closed-form Bayes errors and show how imbalance shifts the discriminant boundary, yielding a deterioration slope that predicts four regimes: Normal, Mild, Extreme, and Catastrophic. Using a balanced high-dimensional genomic dataset, we vary only $η$ while keeping $κ$ and $Δ$ fixed. Across parametric and non-parametric models, empirical degradation closely follows theoretical predictions: minority Recall collapses once $lo g (η)$ exceeds $Δ κ$ , Precision increases asymmetrically, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques · Financial Distress and Bankruptcy Prediction · Explainable Artificial Intelligence (XAI)