Adversarial Examples for Evaluating Reading Comprehension Systems

Robin Jia; Percy Liang

arXiv:1707.07328·cs.CL·July 25, 2017·263 cites

Adversarial Examples for Evaluating Reading Comprehension Systems

Robin Jia, Percy Liang

PDF

Open Access 3 Repos

TL;DR

This paper introduces an adversarial evaluation method for reading comprehension models, revealing that current systems are easily fooled by distractor sentences, thus highlighting the need for models with genuine language understanding.

Contribution

It proposes an adversarial testing scheme for SQuAD that significantly reduces model accuracy, exposing limitations in current reading comprehension systems.

Findings

01

Model accuracy drops from 75% to 36% with adversarial sentences.

02

Adding ungrammatical distractors reduces accuracy further to 7%.

03

Current models are vulnerable to adversarially inserted distractor sentences.

Abstract

Standard accuracy metrics indicate that reading comprehension systems are making rapid progress, but the extent to which these systems truly understand language remains unclear. To reward systems with real language understanding abilities, we propose an adversarial evaluation scheme for the Stanford Question Answering Dataset (SQuAD). Our method tests whether systems can answer questions about paragraphs that contain adversarially inserted sentences, which are automatically generated to distract computer systems without changing the correct answer or misleading humans. In this adversarial setting, the accuracy of sixteen published models drops from an average of $75%$ F1 score to $36%$ ; when the adversary is allowed to add ungrammatical sequences of words, average accuracy on four models decreases further to $7%$ . We hope our insights will motivate the development of new models that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Adversarial Robustness in Machine Learning · Natural Language Processing Techniques