Unsupervised Label Refinement Improves Dataless Text Classification
Zewei Chu, Karl Stratos, Kevin Gimpel

TL;DR
This paper introduces an unsupervised label refinement method using clustering to enhance dataless text classification, making it more robust and less dependent on label description quality.
Contribution
It proposes a clustering-based approach to refine predictions in dataless classification, improving performance and robustness across multiple datasets and classifier architectures.
Findings
Consistent performance improvements across datasets.
Enhanced robustness to label description variations.
Effective with different classifier architectures.
Abstract
Dataless text classification is capable of classifying documents into previously unseen labels by assigning a score to any document paired with a label description. While promising, it crucially relies on accurate descriptions of the label set for each downstream task. This reliance causes dataless classifiers to be highly sensitive to the choice of label descriptions and hinders the broader application of dataless classification in practice. In this paper, we ask the following question: how can we improve dataless text classification using the inputs of the downstream task dataset? Our primary solution is a clustering based approach. Given a dataless classifier, our approach refines its set of predictions using k-means clustering. We demonstrate the broad applicability of our approach by improving the performance of two widely used classifier architectures, one that encodes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text and Document Classification Technologies · Natural Language Processing Techniques
