Annotation Artifacts in Natural Language Inference Data

Suchin Gururangan; Swabha Swayamdipta; Omer Levy; Roy Schwartz; Samuel; R. Bowman; Noah A. Smith

arXiv:1803.02324·cs.CL·April 18, 2018

Annotation Artifacts in Natural Language Inference Data

Suchin Gururangan, Swabha Swayamdipta, Omer Levy, Roy Schwartz, Samuel, R. Bowman, Noah A. Smith

PDF

3 Models

TL;DR

This paper reveals that many natural language inference datasets contain artifacts allowing label prediction from hypotheses alone, which questions the true difficulty of the task and impacts model evaluation.

Contribution

It demonstrates that simple models can predict inference labels from hypotheses alone, exposing dataset biases and artifacts in NLI data.

Findings

01

Simple models classify hypotheses with 67% accuracy on SNLI

02

Simple models classify hypotheses with 53% accuracy on MultiNLI

03

Linguistic phenomena like negation and vagueness correlate with inference classes

Abstract

Large-scale datasets for natural language inference are created by presenting crowd workers with a sentence (premise), and asking them to generate three new sentences (hypotheses) that it entails, contradicts, or is logically neutral with respect to. We show that, in a significant portion of such data, this protocol leaves clues that make it possible to identify the label by looking only at the hypothesis, without observing the premise. Specifically, we show that a simple text categorization model can correctly classify the hypothesis alone in about 67% of SNLI (Bowman et. al, 2015) and 53% of MultiNLI (Williams et. al, 2017). Our analysis reveals that specific linguistic phenomena such as negation and vagueness are highly correlated with certain inference classes. Our findings suggest that the success of natural language inference models to date has been overestimated, and that the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.