Stress Test Evaluation for Natural Language Inference

Aakanksha Naik; Abhilasha Ravichander; Norman Sadeh; Carolyn Rose,; Graham Neubig

arXiv:1806.00692·cs.CL·June 15, 2018·41 cites

Stress Test Evaluation for Natural Language Inference

Aakanksha Naik, Abhilasha Ravichander, Norman Sadeh, Carolyn Rose,, Graham Neubig

PDF

Open Access 1 Repo

TL;DR

This paper introduces stress tests to evaluate whether natural language inference models truly understand semantic content, revealing their strengths and weaknesses across challenging linguistic phenomena.

Contribution

It proposes a novel stress test methodology for assessing the inferential capabilities of NLI models beyond standard datasets.

Findings

01

Models show varying performance on linguistic phenomena

02

Stress tests reveal specific weaknesses in models

03

Results suggest directions for improving NLI systems

Abstract

Natural language inference (NLI) is the task of determining if a natural language hypothesis can be inferred from a given premise in a justifiable manner. NLI was proposed as a benchmark task for natural language understanding. Existing models perform well at standard datasets for NLI, achieving impressive results across different genres of text. However, the extent to which these models understand the semantic content of sentences is unclear. In this work, we propose an evaluation methodology consisting of automatically constructed "stress tests" that allow us to examine whether systems have the ability to make real inferential decisions. Our evaluation of six sentence-encoder models on these stress tests reveals strengths and weaknesses of these models with respect to challenging linguistic phenomena, and suggests important directions for future work in this area.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AbhilashaRavichander/NLI_StressTest
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification