Dealing with Difficult Minority Labels in Imbalanced Mutilabel Data Sets
Francisco Charte, Antonio J. Rivera, Mar\'ia J. del Jesus and, Francisco Herrera

TL;DR
This paper analyzes the challenge of difficult minority labels in imbalanced multilabel datasets, introduces metrics to measure label concurrence, and proposes a novel resampling algorithm called REMEDIAL to improve classifier performance.
Contribution
It introduces SCUMBLE and SCUMBLELbl metrics for assessing label concurrence and proposes REMEDIAL, a new resampling method to address difficult minority labels in imbalanced multilabel data.
Findings
SCUMBLE metrics effectively quantify label concurrence.
REMEDIAL improves classifier performance on difficult labels.
The approach integrates with the R mldr package.
Abstract
Multilabel classification is an emergent data mining task with a broad range of real world applications. Learning from imbalanced multilabel data is being deeply studied latterly, and several resampling methods have been proposed in the literature. The unequal label distribution in most multilabel datasets, with disparate imbalance levels, could be a handicap while learning new classifiers. In addition, this characteristic challenges many of the existent preprocessing algorithms. Furthermore, the concurrence between imbalanced labels can make harder the learning from certain labels. These are what we call \textit{difficult} labels. In this work, the problem of difficult labels is deeply analyzed, its influence in multilabel classifiers is studied, and a novel way to solve this problem is proposed. Specific metrics to assess this trait in multilabel datasets, called \textit{SCUMBLE}…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
