Unsupervised Label Refinement Improves Dataless Text Classification

Zewei Chu; Karl Stratos; Kevin Gimpel

arXiv:2012.04194·cs.CL·December 9, 2020·1 cites

Unsupervised Label Refinement Improves Dataless Text Classification

Zewei Chu, Karl Stratos, Kevin Gimpel

PDF

Open Access 1 Repo

TL;DR

This paper introduces an unsupervised label refinement method using clustering to enhance dataless text classification, making it more robust and less dependent on label description quality.

Contribution

It proposes a clustering-based approach to refine predictions in dataless classification, improving performance and robustness across multiple datasets and classifier architectures.

Findings

01

Consistent performance improvements across datasets.

02

Enhanced robustness to label description variations.

03

Effective with different classifier architectures.

Abstract

Dataless text classification is capable of classifying documents into previously unseen labels by assigning a score to any document paired with a label description. While promising, it crucially relies on accurate descriptions of the label set for each downstream task. This reliance causes dataless classifiers to be highly sensitive to the choice of label descriptions and hinders the broader application of dataless classification in practice. In this paper, we ask the following question: how can we improve dataless text classification using the inputs of the downstream task dataset? Our primary solution is a clustering based approach. Given a dataless classifier, our approach refines its set of predictions using k-means clustering. We demonstrate the broad applicability of our approach by improving the performance of two widely used classifier architectures, one that encodes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ZeweiChu/ULR
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text and Document Classification Technologies · Natural Language Processing Techniques