Open-world Multi-label Text Classification with Extremely Weak   Supervision

Xintong Li; Jinya Jiang; Ria Dharmani; Jayanth Srinivasa; Gaowen Liu,; Jingbo Shang

arXiv:2407.05609·cs.CL·July 9, 2024

Open-world Multi-label Text Classification with Extremely Weak Supervision

Xintong Li, Jinya Jiang, Ria Dharmani, Jayanth Srinivasa, Gaowen Liu,, Jingbo Shang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces X-MLClass, a novel approach for open-world multi-label text classification under extremely weak supervision, leveraging large language models and iterative label space expansion to improve coverage and accuracy.

Contribution

The paper proposes a new iterative method, X-MLClass, that effectively discovers comprehensive label spaces and enhances multi-label classification performance with minimal supervision.

Findings

01

40% improvement in label space coverage on AAPD dataset

02

Achieves state-of-the-art end-to-end classification accuracy

03

Effective discovery of long-tail labels through iterative refinement

Abstract

We study open-world multi-label text classification under extremely weak supervision (XWS), where the user only provides a brief description for classification objectives without any labels or ground-truth label space. Similar single-label XWS settings have been explored recently, however, these methods cannot be easily adapted for multi-label. We observe that (1) most documents have a dominant class covering the majority of content and (2) long-tail labels would appear in some documents as a dominant class. Therefore, we first utilize the user description to prompt a large language model (LLM) for dominant keyphrases of a subset of raw documents, and then construct a (initial) label space via clustering. We further apply a zero-shot multi-label classifier to locate the documents with small top predicted scores, so we can revisit their dominant keyphrases for more long-tail labels. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Kaylee0501/X-MLClass
noneOfficial

Videos

Open-world Multi-label Text Classification with Extremely Weak Supervision· underline

Taxonomy

TopicsText and Document Classification Technologies