LoRAS: An oversampling approach for imbalanced datasets

Saptarshi Bej; Narek Davtyan; Markus Wolfien; Mariam Nassar; Olaf; Wolkenhauer

arXiv:1908.08346·cs.LG·August 18, 2020

LoRAS: An oversampling approach for imbalanced datasets

Saptarshi Bej, Narek Davtyan, Markus Wolfien, Mariam Nassar, Olaf, Wolkenhauer

PDF

1 Repo

TL;DR

LoRAS is a novel oversampling method for imbalanced datasets that improves both F1-Score and Balanced accuracy by better approximating the minority class data manifold, outperforming SMOTE and its extensions.

Contribution

This paper introduces LoRAS, a new oversampling technique that overcomes SMOTE's over-generalization issue by using localized affine shadowsampling to better model the minority class.

Findings

01

LoRAS outperforms SMOTE and its extensions in F1-Score and Balanced accuracy.

02

LoRAS provides a more accurate estimate of the local minority class data distribution.

03

Experimental results on 14 datasets demonstrate the effectiveness of LoRAS.

Abstract

The Synthetic Minority Oversampling TEchnique (SMOTE) is widely-used for the analysis of imbalanced datasets. It is known that SMOTE frequently over-generalizes the minority class, leading to misclassifications for the majority class, and effecting the overall balance of the model. In this article, we present an approach that overcomes this limitation of SMOTE, employing Localized Random Affine Shadowsampling (LoRAS) to oversample from an approximated data manifold of the minority class. We benchmarked our algorithm with 14 publicly available imbalanced datasets using three different Machine Learning (ML) algorithms and compared the performance of LoRAS, SMOTE and several SMOTE extensions that share the concept of using convex combinations of minority class data points for oversampling with LoRAS. We observed that LoRAS, on average generates better ML models in terms of F1-Score and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zoj613/pyloras
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSynthetic Minority Over-sampling Technique.