To SMOTE, or not to SMOTE?
Yotam Elor, Hadar Averbuch-Elor

TL;DR
This study evaluates the impact of data balancing techniques like SMOTE on state-of-the-art classifiers in imbalanced binary classification, revealing that balancing benefits weak classifiers but not strong ones, depending on specific scenarios.
Contribution
The paper provides a comprehensive empirical analysis of balancing effects on SOTA classifiers, highlighting when balancing is beneficial and emphasizing the importance of metrics and hyper-parameter choices.
Findings
Balancing improves weak classifiers' performance.
Balancing does not enhance strong classifiers' accuracy.
Proper metrics and hyper-parameters significantly influence results.
Abstract
Balancing the data before training a classifier is a popular technique to address the challenges of imbalanced binary classification in tabular data. Balancing is commonly achieved by duplication of minority samples or by generation of synthetic minority samples. While it is well known that balancing affects each classifier differently, most prior empirical studies did not include strong state-of-the-art (SOTA) classifiers as baselines. In this work, we are interested in understanding whether balancing is beneficial, particularly in the context of SOTA classifiers. Thus, we conduct extensive experiments considering three SOTA classifiers along the weaker learners used in previous investigations. Additionally, we carefully discern proper metrics, consistent and non-consistent algorithms and hyper-parameter selection methods and show that these have a significant impact on prediction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Machine Learning and Data Classification · Explainable Artificial Intelligence (XAI)
