Few-Shot Text Classification with Pre-Trained Word Embeddings and a   Human in the Loop

Katherine Bailey; Sunny Chopra

arXiv:1804.02063·cs.CL·April 9, 2018·6 cites

Few-Shot Text Classification with Pre-Trained Word Embeddings and a Human in the Loop

Katherine Bailey, Sunny Chopra

PDF

Open Access 1 Repo

TL;DR

This paper presents a few-shot text classification method leveraging pre-trained word embeddings and a human-in-the-loop process, enabling efficient labeling with minimal manual effort for large unlabeled corpora.

Contribution

It introduces a novel few-shot classification approach that uses pre-trained embeddings and human-in-the-loop labeling, reducing the need for extensive labeled data.

Findings

01

Achieved competitive accuracy on existing datasets

02

Demonstrated effectiveness with minimal manual labeling

03

Provided reproducible code for 20 Newsgroups dataset

Abstract

Most of the literature around text classification treats it as a supervised learning problem: given a corpus of labeled documents, train a classifier such that it can accurately predict the classes of unseen documents. In industry, however, it is not uncommon for a business to have entire corpora of documents where few or none have been classified, or where existing classifications have become meaningless. With web content, for example, poor taxonomy management can result in labels being applied indiscriminately, making filtering by these labels unhelpful. Our work aims to make it possible to classify an entire corpus of unlabeled documents using a human-in-the-loop approach, where the content owner manually classifies just one or two documents per category and the rest can be automatically classified. This "few-shot" learning approach requires rich representations of the documents such…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

katbailey/few-shot-text-classification
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text and Document Classification Technologies · Natural Language Processing Techniques