The Effect of Balancing Methods on Model Behavior in Imbalanced Classification Problems
Adrian Stando, Mustafa Cavus, Przemys{\l}aw Biecek

TL;DR
This paper investigates how data balancing methods affect model behavior in imbalanced classification, revealing significant behavioral changes and proposing a new analysis tool for better balancing strategy selection.
Contribution
It introduces a comprehensive analysis of balancing methods' impact on model behavior using Explainable AI tools and proposes a novel performance gain plot for optimal balancing strategy.
Findings
Balancing methods significantly alter model behavior.
Models trained on balanced data may become biased toward the balanced distribution.
The proposed performance gain plot aids in selecting the best balancing method.
Abstract
Imbalanced data poses a significant challenge in classification as model performance is affected by insufficient learning from minority classes. Balancing methods are often used to address this problem. However, such techniques can lead to problems such as overfitting or loss of information. This study addresses a more challenging aspect of balancing methods - their impact on model behavior. To capture these changes, Explainable Artificial Intelligence tools are used to compare models trained on datasets before and after balancing. In addition to the variable importance method, this study uses the partial dependence profile and accumulated local effects techniques. Real and simulated datasets are tested, and an open-source Python package edgaro is developed to facilitate this analysis. The results obtained show significant changes in model behavior due to balancing methods, which can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Statistical and Computational Modeling · Forecasting Techniques and Applications
