Balancing Methods for Multi-label Text Classification with Long-Tailed Class Distribution
Yi Huang, Buse Giledereli, Abdullatif K\"oksal, Arzucan \"Ozg\"ur,, Elif Ozkirimli

TL;DR
This paper demonstrates that distribution-balanced loss functions effectively improve multi-label text classification performance in long-tailed class distributions by addressing both class imbalance and label dependencies, outperforming traditional methods.
Contribution
It introduces the application of distribution-balanced loss functions to multi-label text classification, showing their effectiveness in natural language processing tasks with long-tailed label distributions.
Findings
Distribution-balanced loss outperforms standard loss functions.
Effective in datasets with long-tailed label distributions.
Applicable to both general and domain-specific datasets.
Abstract
Multi-label text classification is a challenging task because it requires capturing label dependencies. It becomes even more challenging when class distribution is long-tailed. Resampling and re-weighting are common approaches used for addressing the class imbalance problem, however, they are not effective when there is label dependency besides class imbalance because they result in oversampling of common labels. Here, we introduce the application of balancing loss functions for multi-label text classification. We perform experiments on a general domain dataset with 90 labels (Reuters-21578) and a domain-specific dataset from PubMed with 18211 labels. We find that a distribution-balanced loss function, which inherently addresses both the class imbalance and label linkage problems, outperforms commonly used loss functions. Distribution balancing methods have been successfully used in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Imbalanced Data Classification Techniques · Topic Modeling
