Substituting Data Annotation with Balanced Updates and Collective Loss in Multi-label Text Classification
Muberra Ozmen, Joseph Cotnareanu, Mark Coates

TL;DR
This paper introduces a novel approach for multi-label text classification that reduces reliance on annotated data by using label dependency graphs and collective loss, significantly improving performance in low-supervision scenarios.
Contribution
The proposed method eliminates the need for extensive annotation by leveraging label descriptions and dependency graphs with a collective loss, outperforming initial models in scarce data settings.
Findings
Achieves 70% improvement in example-based F1 score under low supervision.
Uses label dependency graphs to enhance label likelihood updates.
Operates with minimal additional computational overhead.
Abstract
Multi-label text classification (MLTC) is the task of assigning multiple labels to a given text, and has a wide range of application domains. Most existing approaches require an enormous amount of annotated data to learn a classifier and/or a set of well-defined constraints on the label space structure, such as hierarchical relations which may be complicated to provide as the number of labels increases. In this paper, we study the MLTC problem in annotation-free and scarce-annotation settings in which the magnitude of available supervision signals is linear to the number of labels. Our method follows three steps, (1) mapping input text into a set of preliminary label likelihoods by natural language inference using a pre-trained language model, (2) calculating a signed label dependency graph by label descriptions, and (3) updating the preliminary label likelihoods with message passing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Music and Audio Processing · Natural Language Processing Techniques
