AEMLO: AutoEncoder-Guided Multi-Label Oversampling

Ao Zhou; Bin Liu; Jin Wang; Kaiwei Sun; Kelin Liu

arXiv:2408.13078·cs.LG·August 26, 2024

AEMLO: AutoEncoder-Guided Multi-Label Oversampling

Ao Zhou, Bin Liu, Jin Wang, Kaiwei Sun, Kelin Liu

PDF

Open Access 1 Repo

TL;DR

AEMLO introduces an AutoEncoder-guided oversampling method that generates diverse synthetic samples for imbalanced multi-label datasets, improving classifier performance over existing techniques.

Contribution

The paper proposes a novel AutoEncoder-based oversampling approach specifically designed for multi-label imbalance, with a tailored objective function for better synthetic sample generation.

Findings

01

AEMLO outperforms state-of-the-art oversampling methods in empirical tests.

02

The method effectively generates diverse synthetic samples for imbalanced multi-label data.

03

AEMLO improves multi-label classifier performance on benchmark datasets.

Abstract

Class imbalance significantly impacts the performance of multi-label classifiers. Oversampling is one of the most popular approaches, as it augments instances associated with less frequent labels to balance the class distribution. Existing oversampling methods generate feature vectors of synthetic samples through replication or linear interpolation and assign labels through neighborhood information. Linear interpolation typically generates new samples between existing data points, which may result in insufficient diversity of synthesized samples and further lead to the overfitting issue. Deep learning-based methods, such as AutoEncoders, have been proposed to generate more diverse and complex synthetic samples, achieving excellent performance on imbalanced binary or multi-class datasets. In this study, we introduce AEMLO, an AutoEncoder-guided Oversampling technique specifically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cquptza/aemlo
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies