Balancing the Scales: A Comprehensive Study on Tackling Class Imbalance   in Binary Classification

Mohamed Abdelhamid; Abhyuday Desai

arXiv:2409.19751·cs.LG·October 1, 2024·5 cites

Balancing the Scales: A Comprehensive Study on Tackling Class Imbalance in Binary Classification

Mohamed Abdelhamid, Abhyuday Desai

PDF

Open Access

TL;DR

This comprehensive study evaluates three popular class imbalance handling strategies across multiple models and datasets, finding that decision threshold calibration is most effective but varies by dataset, emphasizing the need for tailored approaches.

Contribution

The paper provides a large-scale empirical comparison of SMOTE, class weights, and threshold calibration across diverse datasets and models, highlighting their relative effectiveness and variability.

Findings

01

Decision threshold calibration is most consistently effective.

02

All three strategies outperform no intervention.

03

Effectiveness varies significantly across datasets.

Abstract

Class imbalance in binary classification tasks remains a significant challenge in machine learning, often resulting in poor performance on minority classes. This study comprehensively evaluates three widely-used strategies for handling class imbalance: Synthetic Minority Over-sampling Technique (SMOTE), Class Weights tuning, and Decision Threshold Calibration. We compare these methods against a baseline scenario of no-intervention across 15 diverse machine learning models and 30 datasets from various domains, conducting a total of 9,000 experiments. Performance was primarily assessed using the F1-score, although our study also tracked results on additional 9 metrics including F2-score, precision, recall, Brier-score, PR-AUC, and AUC. Our results indicate that all three strategies generally outperform the baseline, with Decision Threshold Calibration emerging as the most consistently…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques