Pairwise Instance Relation Augmentation for Long-tailed Multi-label Text Classification
Lin Xiao, Pengyu Xu, Liping Jing, Xiangliang Zhang

TL;DR
This paper introduces PIRAN, a novel data augmentation method for long-tailed multi-label text classification that generates diverse instances for underrepresented tail labels, significantly improving classification performance.
Contribution
The paper proposes PIRAN, a relation-based augmentation network that balances tail and head labels by generating high-level feature instances with regularizers to ensure diversity and consistency.
Findings
PIRAN outperforms state-of-the-art methods on three benchmark datasets.
It significantly improves tail label classification accuracy.
The approach effectively balances label distribution in long-tailed datasets.
Abstract
Multi-label text classification (MLTC) is one of the key tasks in natural language processing. It aims to assign multiple target labels to one document. Due to the uneven popularity of labels, the number of documents per label follows a long-tailed distribution in most cases. It is much more challenging to learn classifiers for data-scarce tail labels than for data-rich head labels. The main reason is that head labels usually have sufficient information, e.g., a large intra-class diversity, while tail labels do not. In response, we propose a Pairwise Instance Relation Augmentation Network (PIRAN) to augment tailed-label documents for balancing tail labels and head labels. PIRAN consists of a relation collector and an instance generator. The former aims to extract the document pairwise relations from head labels. Taking these relations as perturbations, the latter tries to generate new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Web Data Mining and Analysis · Sentiment Analysis and Opinion Mining
