A Self-Adaptive Synthetic Over-Sampling Technique for Imbalanced   Classification

Xiaowei Gu; Plamen P Angelov; Eduardo Almeida Soares

arXiv:1911.11018·cs.LG·April 21, 2020

A Self-Adaptive Synthetic Over-Sampling Technique for Imbalanced Classification

Xiaowei Gu, Plamen P Angelov, Eduardo Almeida Soares

PDF

TL;DR

This paper introduces a self-adaptive synthetic over-sampling method to address class imbalance in supervised learning, improving fairness and performance across various algorithms especially with limited data.

Contribution

The proposed technique synthesizes data near minority class samples to balance classes and enhance classifier performance, adaptable to multiple algorithms.

Findings

01

Achieves more balanced and fair classification results.

02

Boosts overall performance and class-specific accuracy.

03

Effective with small labeled datasets.

Abstract

Traditionally, in supervised machine learning, (a significant) part of the available data (usually 50% to 80%) is used for training and the rest for validation. In many problems, however, the data is highly imbalanced in regard to different classes or does not have good coverage of the feasible data space which, in turn, creates problems in validation and usage phase. In this paper, we propose a technique for synthesising feasible and likely data to help balance the classes as well as to boost the performance in terms of confusion matrix as well as overall. The idea, in a nutshell, is to synthesise data samples in close vicinity to the actual data samples specifically for the less represented (minority) classes. This has also implications to the so-called fairness of machine learning. In this paper, we propose a specific method for synthesising data in a way to balance the classes and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.