Handling Imbalanced Dataset in Multi-label Text Categorization using   Bagging and Adaptive Boosting

Genta Indra Winata; Masayu Leylia Khodra

arXiv:1810.11612·cs.CL·June 12, 2019

Handling Imbalanced Dataset in Multi-label Text Categorization using Bagging and Adaptive Boosting

Genta Indra Winata, Masayu Leylia Khodra

PDF

TL;DR

This paper explores the use of Bagging and Adaptive Boosting algorithms to address class imbalance in multi-label text categorization, demonstrating improved performance with specific classifiers and metrics.

Contribution

It introduces the application of Bagging and Adaptive Boosting techniques to imbalanced multi-label text categorization, highlighting their effectiveness with different weak classifiers.

Findings

01

Bagging ML-LP with SMO performs best in subset and example-based accuracy.

02

Bagging ML-BR with SMO achieves highest micro-averaged F-measure.

03

AdaBoost MH with J48 has the lowest Hamming loss.

Abstract

Imbalanced dataset is occurred due to uneven distribution of data available in the real world such as disposition of complaints on government offices in Bandung. Consequently, multi-label text categorization algorithms may not produce the best performance because classifiers tend to be weighed down by the majority of the data and ignore the minority. In this paper, Bagging and Adaptive Boosting algorithms are employed to handle the issue and improve the performance of text categorization. The result is evaluated with four evaluation metrics such as hamming loss, subset accuracy, example-based accuracy and micro-averaged f-measure. Bagging ML-LP with SMO weak classifier is the best performer in terms of subset accuracy and example-based accuracy. Bagging ML-BR with SMO weak classifier has the best micro-averaged f-measure among all. In other hand, AdaBoost MH with J48 weak classifier has…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.