TL;DR
This paper empirically evaluates various large-scale multi-label text classification methods, demonstrating the superiority of hierarchical and Transformer-based models, and introduces new approaches for improved few and zero-shot learning leveraging label hierarchies.
Contribution
It provides the first comprehensive empirical comparison of LMTC methods, introduces a new state-of-the-art combining BERT with LWANs, and proposes models leveraging label hierarchies for better few and zero-shot learning.
Findings
Hierarchical Probabilistic Label Trees outperform flat LWANs.
Transformer-based models outperform previous state-of-the-art in two datasets.
New models leveraging label hierarchies improve few and zero-shot learning.
Abstract
Large-scale Multi-label Text Classification (LMTC) has a wide range of Natural Language Processing (NLP) applications and presents interesting challenges. First, not all labels are well represented in the training set, due to the very large label set and the skewed label distributions of LMTC datasets. Also, label hierarchies and differences in human labelling guidelines may affect graph-aware annotation proximity. Finally, the label hierarchies are periodically updated, requiring LMTC models capable of zero-shot generalization. Current state-of-the-art LMTC models employ Label-Wise Attention Networks (LWANs), which (1) typically treat LMTC as flat multi-label classification; (2) may use the label hierarchy to improve zero-shot learning, although this practice is vastly understudied; and (3) have not been combined with pre-trained Transformers (e.g. BERT), which have led to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Dense Connections · Layer Normalization · WordPiece · Multi-Head Attention · Dropout · Linear Warmup With Linear Decay · Attention Dropout · Weight Decay · Attention Is All You Need
