Many-Class Text Classification with Matching
Yi Song, Yuxian Gu, Minlie Huang

TL;DR
This paper introduces TCM, a matching-based framework for multi-class text classification that leverages label semantics, achieving significant improvements especially in low-resource and large-label scenarios.
Contribution
The paper proposes a novel matching-based approach called TCM that utilizes label semantic information for improved multi-class text classification.
Findings
TCM outperforms existing methods on 4 datasets with 20+ labels.
TCM is effective in both few-shot and full-data settings.
Extensive experiments validate the advantages of label-aware matching.
Abstract
In this work, we formulate \textbf{T}ext \textbf{C}lassification as a \textbf{M}atching problem between the text and the labels, and propose a simple yet effective framework named TCM. Compared with previous text classification approaches, TCM takes advantage of the fine-grained semantic information of the classification labels, which helps distinguish each class better when the class number is large, especially in low-resource scenarios. TCM is also easy to implement and is compatible with various large pretrained language models. We evaluate TCM on 4 text classification datasets (each with 20+ labels) in both few-shot and full-data settings, and this model demonstrates significant improvements over other text classification paradigms. We also conduct extensive experiments with different variants of TCM and discuss the underlying factors of its success. Our method and analyses offer a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text and Document Classification Technologies · Natural Language Processing Techniques
