TL;DR
This paper introduces a simple, noise-reducing oversampling technique combining k-means clustering with SMOTE to improve classification on imbalanced datasets, outperforming existing methods.
Contribution
The paper proposes a novel oversampling method that integrates k-means clustering with SMOTE, reducing noise and enhancing performance on imbalanced data.
Findings
Outperforms other oversampling methods on 71 datasets
Effectively reduces noise in synthetic data generation
Improves classification accuracy on imbalanced datasets
Abstract
Learning from class-imbalanced data continues to be a common and challenging problem in supervised learning as standard classification algorithms are designed to handle balanced class distributions. While different strategies exist to tackle this problem, methods which generate artificial data to achieve a balanced class distribution are more versatile than modifications to the classification algorithm. Such techniques, called oversamplers, modify the training data, allowing any classifier to be used with class-imbalanced datasets. Many algorithms have been proposed for this task, but most are complex and tend to generate unnecessary noise. This work presents a simple and effective oversampling method based on k-means clustering and SMOTE oversampling, which avoids the generation of noise and effectively overcomes imbalances between and within classes. Empirical results of extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methodsk-Means Clustering
