Substituting Data Annotation with Balanced Updates and Collective Loss   in Multi-label Text Classification

Muberra Ozmen; Joseph Cotnareanu; Mark Coates

arXiv:2309.13543·cs.CL·September 26, 2023

Substituting Data Annotation with Balanced Updates and Collective Loss in Multi-label Text Classification

Muberra Ozmen, Joseph Cotnareanu, Mark Coates

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel approach for multi-label text classification that reduces reliance on annotated data by using label dependency graphs and collective loss, significantly improving performance in low-supervision scenarios.

Contribution

The proposed method eliminates the need for extensive annotation by leveraging label descriptions and dependency graphs with a collective loss, outperforming initial models in scarce data settings.

Findings

01

Achieves 70% improvement in example-based F1 score under low supervision.

02

Uses label dependency graphs to enhance label likelihood updates.

03

Operates with minimal additional computational overhead.

Abstract

Multi-label text classification (MLTC) is the task of assigning multiple labels to a given text, and has a wide range of application domains. Most existing approaches require an enormous amount of annotated data to learn a classifier and/or a set of well-defined constraints on the label space structure, such as hierarchical relations which may be complicated to provide as the number of labels increases. In this paper, we study the MLTC problem in annotation-free and scarce-annotation settings in which the magnitude of available supervision signals is linear to the number of labels. Our method follows three steps, (1) mapping input text into a set of preliminary label likelihoods by natural language inference using a pre-trained language model, (2) calculating a signed label dependency graph by label descriptions, and (3) updating the preliminary label likelihoods with message passing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

muberraozmen/bncl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Music and Audio Processing · Natural Language Processing Techniques