CUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced   Classification

Farshid Rayhan; Sajid Ahmed; Asif Mahbub; Md. Rafsan Jani; Swakkhar; Shatabda; and Dewan Md. Farid

arXiv:1712.04356·cs.LG·September 5, 2018

CUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification

Farshid Rayhan, Sajid Ahmed, Asif Mahbub, Md. Rafsan Jani, Swakkhar, Shatabda, and Dewan Md. Farid

PDF

1 Repo

TL;DR

This paper introduces CUSBoost, a novel clustering-based under-sampling method combined with boosting, which improves classification accuracy on highly imbalanced datasets by effectively reducing bias towards the majority class.

Contribution

The paper proposes CUSBoost, a new ensemble learning approach that integrates clustering-based under-sampling with AdaBoost, outperforming existing methods like RUSBoost and SMOTEBoost.

Findings

01

CUSBoost outperforms state-of-the-art ensemble methods on various datasets.

02

It effectively handles highly imbalanced datasets with improved accuracy.

03

Experimental results validate its robustness across multiple imbalance ratios.

Abstract

Class imbalance classification is a challenging research problem in data mining and machine learning, as most of the real-life datasets are often imbalanced in nature. Existing learning algorithms maximise the classification accuracy by correctly classifying the majority class, but misclassify the minority class. However, the minority class instances are representing the concept with greater interest than the majority class instances in real-life applications. Recently, several techniques based on sampling methods (under-sampling of the majority class and over-sampling the minority class), cost-sensitive learning methods, and ensemble learning have been used in the literature for classifying imbalanced datasets. In this paper, we introduce a new clustering-based under-sampling approach with boosting (AdaBoost) algorithm, called CUSBoost, for effective imbalanced classification. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

farshidrayhanuiu/CUSBoost
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.