Counterfactual-based minority oversampling for imbalanced classification

Hao Luo; Li Liu

arXiv:2008.09488·cs.LG·December 24, 2020·1 cites

Counterfactual-based minority oversampling for imbalanced classification

Hao Luo, Li Liu

PDF

Open Access

TL;DR

This paper introduces a counterfactual-based oversampling method for imbalanced classification that generates minority samples by perturbing majority class data, improving decision boundary proximity and overall performance.

Contribution

It proposes a novel counterfactual framework that leverages majority class information to generate more effective minority samples near decision boundaries.

Findings

01

Outperforms state-of-the-art oversampling methods on benchmark datasets.

02

Generates minority samples near decision boundaries, enhancing classifier performance.

03

Theoretically guarantees samples satisfy minimum inversion criteria.

Abstract

A key challenge of oversampling in imbalanced classification is that the generation of new minority samples often neglects the usage of majority classes, resulting in most new minority sampling spreading the whole minority space. In view of this, we present a new oversampling framework based on the counterfactual theory. Our framework introduces a counterfactual objective by leveraging the rich inherent information of majority classes and explicitly perturbing majority samples to generate new samples in the territory of minority space. It can be analytically shown that the new minority samples satisfy the minimum inversion, and therefore most of them locate near the decision boundary. Empirical evaluations on benchmark datasets suggest that our approach significantly outperforms the state-of-the-art methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques · Electricity Theft Detection Techniques · Text and Document Classification Technologies