TL;DR
LoRAS is a novel oversampling method for imbalanced datasets that improves both F1-Score and Balanced accuracy by better approximating the minority class data manifold, outperforming SMOTE and its extensions.
Contribution
This paper introduces LoRAS, a new oversampling technique that overcomes SMOTE's over-generalization issue by using localized affine shadowsampling to better model the minority class.
Findings
LoRAS outperforms SMOTE and its extensions in F1-Score and Balanced accuracy.
LoRAS provides a more accurate estimate of the local minority class data distribution.
Experimental results on 14 datasets demonstrate the effectiveness of LoRAS.
Abstract
The Synthetic Minority Oversampling TEchnique (SMOTE) is widely-used for the analysis of imbalanced datasets. It is known that SMOTE frequently over-generalizes the minority class, leading to misclassifications for the majority class, and effecting the overall balance of the model. In this article, we present an approach that overcomes this limitation of SMOTE, employing Localized Random Affine Shadowsampling (LoRAS) to oversample from an approximated data manifold of the minority class. We benchmarked our algorithm with 14 publicly available imbalanced datasets using three different Machine Learning (ML) algorithms and compared the performance of LoRAS, SMOTE and several SMOTE extensions that share the concept of using convex combinations of minority class data points for oversampling with LoRAS. We observed that LoRAS, on average generates better ML models in terms of F1-Score and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSynthetic Minority Over-sampling Technique.
