Benchmarking Zero-shot Text Classification: Datasets, Evaluation and   Entailment Approach

Wenpeng Yin; Jamaal Hay; Dan Roth

arXiv:1909.00161·cs.CL·September 4, 2019·36 cites

Benchmarking Zero-shot Text Classification: Datasets, Evaluation and Entailment Approach

Wenpeng Yin, Jamaal Hay, Dan Roth

PDF

Open Access 4 Repos 10 Models

TL;DR

This paper benchmarks zero-shot text classification across diverse datasets and aspects, proposing standardized evaluation methods and a textual entailment approach to improve the understanding and performance of zero-shot classification models.

Contribution

It introduces diverse datasets, extends evaluation protocols to fully unseen labels, and unifies zero-shot classification under a textual entailment framework.

Findings

01

Datasets cover multiple aspects like topic, emotion, and situation.

02

Evaluation includes label-fully-unseen zero-shot classification.

03

Textual entailment formulation improves zero-shot classification performance.

Abstract

Zero-shot text classification (0Shot-TC) is a challenging NLU problem to which little attention has been paid by the research community. 0Shot-TC aims to associate an appropriate label with a piece of text, irrespective of the text domain and the aspect (e.g., topic, emotion, event, etc.) described by the label. And there are only a few articles studying 0Shot-TC, all focusing only on topical categorization which, we argue, is just the tip of the iceberg in 0Shot-TC. In addition, the chaotic experiments in literature make no uniform comparison, which blurs the progress. This work benchmarks the 0Shot-TC problem by providing unified datasets, standardized evaluations, and state-of-the-art baselines. Our contributions include: i) The datasets we provide facilitate studying 0Shot-TC relative to conceptually different and diverse aspects: the ``topic'' aspect includes ``sports'' and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies