A Theoretical and Empirical Taxonomy of Imbalance in Binary Classification
Rose Yvette Bandolo Essomba, Ernest Fokou\'e

TL;DR
This paper develops a unified theoretical framework to analyze how class imbalance affects binary classification, validated through empirical experiments on genomic data showing predictable performance degradation regimes.
Contribution
It introduces a novel triplet of parameters $( ext{imbalance coefficient}, ext{sample--dimension ratio}, ext{separability})$ to explain imbalance effects across models.
Findings
Degradation regimes are predicted by the triplet $( ext{eta}, ext{kappa}, ext{Delta})$.
Empirical results match theoretical predictions on genomic data.
Performance metrics decline predictably as imbalance increases.
Abstract
Class imbalance significantly degrades classification performance, yet its effects are rarely analyzed from a unified theoretical perspective. We propose a principled framework based on three fundamental scales: the imbalance coefficient , the sample--dimension ratio , and the intrinsic separability . Starting from the Gaussian Bayes classifier, we derive closed-form Bayes errors and show how imbalance shifts the discriminant boundary, yielding a deterioration slope that predicts four regimes: Normal, Mild, Extreme, and Catastrophic. Using a balanced high-dimensional genomic dataset, we vary only while keeping and fixed. Across parametric and non-parametric models, empirical degradation closely follows theoretical predictions: minority Recall collapses once exceeds , Precision increases asymmetrically, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Financial Distress and Bankruptcy Prediction · Explainable Artificial Intelligence (XAI)
