Self-paced Ensemble for Highly Imbalanced Massive Data Classification

Zhining Liu; Wei Cao; Zhifeng Gao; Jiang Bian; Hechang Chen; Yi Chang,; Tie-Yan Liu

arXiv:1909.03500·cs.LG·October 20, 2020

Self-paced Ensemble for Highly Imbalanced Massive Data Classification

Zhining Liu, Wei Cao, Zhifeng Gao, Jiang Bian, Hechang Chen, Yi Chang,, Tie-Yan Liu

PDF

1 Repo

TL;DR

This paper introduces a self-paced ensemble framework that improves classification performance on highly imbalanced, noisy, and overlapping large-scale datasets by harmonizing data hardness through under-sampling, enhancing robustness and efficiency.

Contribution

It proposes a novel, computationally efficient ensemble method that addresses class imbalance, noise, and overlap by self-paced data harmonization, adaptable to various classifiers.

Findings

01

Achieves robust performance on highly imbalanced datasets

02

Enhances existing classifiers like SVM, GBDT, Neural Networks

03

Maintains high computational efficiency

Abstract

Many real-world applications reveal difficulties in learning classifiers from imbalanced data. The rising big data era has been witnessing more classification tasks with large-scale but extremely imbalance and low-quality datasets. Most of existing learning methods suffer from poor performance or low computation efficiency under such a scenario. To tackle this problem, we conduct deep investigations into the nature of class imbalance, which reveals that not only the disproportion between classes, but also other difficulties embedded in the nature of data, especially, noises and class overlapping, prevent us from learning effective classifiers. Taking those factors into consideration, we propose a novel framework for imbalance classification that aims to generate a strong ensemble by self-paced harmonizing data hardness via under-sampling. Extensive experiments have shown that this new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ZhiningLiu1998/self-paced-ensemble
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSupport Vector Machine