Selecting the suitable resampling strategy for imbalanced data classification regarding dataset properties
Mohamed S. Kraiem, Fernando S\'anchez-Hern\'andez, Mar\'ia N., Moreno-Garc\'ia

TL;DR
This paper analyzes how dataset properties influence the effectiveness of resampling strategies for imbalanced classification and develops models to automatically select the best approach based on dataset characteristics.
Contribution
It introduces a comprehensive comparative study of resampling strategies across diverse datasets and proposes models for automatic strategy selection based on dataset features.
Findings
Models can predict the most suitable resampling strategy based on dataset properties.
Both basic and advanced resampling methods are evaluated with multiple performance metrics.
The approach is domain-independent and adaptable to various imbalanced classification scenarios.
Abstract
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class. This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples. Thus, the prediction model is unreliable although the overall model accuracy can be acceptable. Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class. However, their effectiveness depends on several factors mainly related to data intrinsic characteristics, such as imbalance ratio, dataset size and dimensionality, overlapping between classes or borderline examples. In this work, the impact of these factors is analyzed through a comprehensive comparative study…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
