Tackling Diverse Minorities in Imbalanced Classification

Kwei-Herng Lai; Daochen Zha; Huiyuan Chen; Mangesh Bendre; Yuzhong; Chen; Mahashweta Das; Hao Yang; Xia Hu

arXiv:2308.14838·cs.LG·August 30, 2023

Tackling Diverse Minorities in Imbalanced Classification

Kwei-Herng Lai, Daochen Zha, Huiyuan Chen, Mangesh Bendre, Yuzhong, Chen, Mahashweta Das, Hao Yang, Xia Hu

PDF

TL;DR

This paper introduces a novel iterative data augmentation framework using reinforcement learning to generate synthetic minority samples, effectively improving classification in highly imbalanced and diverse minority scenarios.

Contribution

It formulates the data augmentation process as an MDP and employs an actor-critic approach to adaptively generate synthetic samples, addressing the challenge of diverse minority distributions.

Findings

01

Improved classifier performance on imbalanced datasets.

02

Effective handling of diverse minority distributions.

03

Robustness across multiple classifiers and datasets.

Abstract

Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers. When working with large datasets, the imbalanced issue can be further exacerbated, making it exceptionally difficult to train classifiers effectively. To address the problem, over-sampling techniques have been developed to linearly interpolating data instances between minorities and their neighbors. However, in many real-world scenarios such as anomaly detection, minority instances are often dispersed diversely in the feature space rather than clustered together. Inspired by domain-agnostic data mix-up, we propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes. It is non-trivial to develop such a framework, the challenges include source sample selection, mix-up strategy selection, and the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.