Oversampling for Imbalanced Learning Based on K-Means and SMOTE

Felix Last; Georgios Douzas; Fernando Bacao

arXiv:1711.00837·cs.LG·March 6, 2020

Oversampling for Imbalanced Learning Based on K-Means and SMOTE

Felix Last, Georgios Douzas, Fernando Bacao

PDF

1 Repo

TL;DR

This paper introduces a simple, noise-reducing oversampling technique combining k-means clustering with SMOTE to improve classification on imbalanced datasets, outperforming existing methods.

Contribution

The paper proposes a novel oversampling method that integrates k-means clustering with SMOTE, reducing noise and enhancing performance on imbalanced data.

Findings

01

Outperforms other oversampling methods on 71 datasets

02

Effectively reduces noise in synthetic data generation

03

Improves classification accuracy on imbalanced datasets

Abstract

Learning from class-imbalanced data continues to be a common and challenging problem in supervised learning as standard classification algorithms are designed to handle balanced class distributions. While different strategies exist to tackle this problem, methods which generate artificial data to achieve a balanced class distribution are more versatile than modifications to the classification algorithm. Such techniques, called oversamplers, modify the training data, allowing any classifier to be used with class-imbalanced datasets. Many algorithms have been proposed for this task, but most are complex and tend to generate unnecessary noise. This work presents a simple and effective oversampling method based on k-means clustering and SMOTE oversampling, which avoids the generation of noise and effectively overcomes imbalances between and within classes. Empirical results of extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

felix-last/kmeans_smote
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Methodsk-Means Clustering