DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning   Over Paragraphs

Dheeru Dua; Yizhong Wang; Pradeep Dasigi; Gabriel Stanovsky; Sameer; Singh; Matt Gardner

arXiv:1903.00161·cs.CL·April 18, 2019·96 cites

DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs

Dheeru Dua, Yizhong Wang, Pradeep Dasigi, Gabriel Stanovsky, Sameer, Singh, Matt Gardner

PDF

Open Access 3 Repos 10 Models 5 Datasets

TL;DR

DROP is a challenging reading comprehension benchmark that requires discrete reasoning over paragraphs, revealing significant gaps in current models' understanding compared to human performance.

Contribution

This paper introduces DROP, a new benchmark dataset emphasizing discrete reasoning, and demonstrates the limitations of existing models while proposing a combined approach for improved performance.

Findings

01

Current models achieve only 32.7% F1 on DROP

02

Humans reach 96.0% F1 on the dataset

03

A new combined model improves F1 to 47.0%

Abstract

Reading comprehension has recently seen rapid progress, with systems matching humans on the most popular datasets for the task. However, a large body of work has highlighted the brittleness of these systems, showing that there is much work left to be done. We introduce a new English reading comprehension benchmark, DROP, which requires Discrete Reasoning Over the content of Paragraphs. In this crowdsourced, adversarially-created, 96k-question benchmark, a system must resolve references in a question, perhaps to multiple input positions, and perform discrete operations over them (such as addition, counting, or sorting). These operations require a much more comprehensive understanding of the content of paragraphs than what was necessary for prior datasets. We apply state-of-the-art methods from both the reading comprehension and semantic parsing literature on this dataset and show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications