TL;DR
This paper introduces a self-paced ensemble framework that improves classification performance on highly imbalanced, noisy, and overlapping large-scale datasets by harmonizing data hardness through under-sampling, enhancing robustness and efficiency.
Contribution
It proposes a novel, computationally efficient ensemble method that addresses class imbalance, noise, and overlap by self-paced data harmonization, adaptable to various classifiers.
Findings
Achieves robust performance on highly imbalanced datasets
Enhances existing classifiers like SVM, GBDT, Neural Networks
Maintains high computational efficiency
Abstract
Many real-world applications reveal difficulties in learning classifiers from imbalanced data. The rising big data era has been witnessing more classification tasks with large-scale but extremely imbalance and low-quality datasets. Most of existing learning methods suffer from poor performance or low computation efficiency under such a scenario. To tackle this problem, we conduct deep investigations into the nature of class imbalance, which reveals that not only the disproportion between classes, but also other difficulties embedded in the nature of data, especially, noises and class overlapping, prevent us from learning effective classifiers. Taking those factors into consideration, we propose a novel framework for imbalance classification that aims to generate a strong ensemble by self-paced harmonizing data hardness via under-sampling. Extensive experiments have shown that this new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSupport Vector Machine
