Few-Shot Text Classification with Pre-Trained Word Embeddings and a Human in the Loop
Katherine Bailey, Sunny Chopra

TL;DR
This paper presents a few-shot text classification method leveraging pre-trained word embeddings and a human-in-the-loop process, enabling efficient labeling with minimal manual effort for large unlabeled corpora.
Contribution
It introduces a novel few-shot classification approach that uses pre-trained embeddings and human-in-the-loop labeling, reducing the need for extensive labeled data.
Findings
Achieved competitive accuracy on existing datasets
Demonstrated effectiveness with minimal manual labeling
Provided reproducible code for 20 Newsgroups dataset
Abstract
Most of the literature around text classification treats it as a supervised learning problem: given a corpus of labeled documents, train a classifier such that it can accurately predict the classes of unseen documents. In industry, however, it is not uncommon for a business to have entire corpora of documents where few or none have been classified, or where existing classifications have become meaningless. With web content, for example, poor taxonomy management can result in labels being applied indiscriminately, making filtering by these labels unhelpful. Our work aims to make it possible to classify an entire corpus of unlabeled documents using a human-in-the-loop approach, where the content owner manually classifies just one or two documents per category and the rest can be automatically classified. This "few-shot" learning approach requires rich representations of the documents such…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text and Document Classification Technologies · Natural Language Processing Techniques
