How Much Reading Does Reading Comprehension Require? A Critical   Investigation of Popular Benchmarks

Divyansh Kaushik; Zachary C. Lipton

arXiv:1808.04926·cs.CL·August 22, 2018

How Much Reading Does Reading Comprehension Require? A Critical Investigation of Popular Benchmarks

Divyansh Kaushik, Zachary C. Lipton

PDF

TL;DR

This paper critically examines popular reading comprehension benchmarks, revealing that simple models often perform surprisingly well and questioning the true difficulty and necessity of combining question and passage information.

Contribution

It establishes baseline performances for several datasets and highlights that many benchmarks may not require complex reasoning, challenging assumptions about their difficulty.

Findings

01

Question-only models perform well on many datasets.

02

Passage-only models achieve high accuracy on several tasks.

03

Last sentence in stories suffices for accurate predictions in CBT.

Abstract

Many recent papers address reading comprehension, where examples consist of (question, passage, answer) tuples. Presumably, a model must combine information from both questions and passages to predict corresponding answers. However, despite intense interest in the topic, with hundreds of published papers vying for leaderboard dominance, basic questions about the difficulty of many popular benchmarks remain unanswered. In this paper, we establish sensible baselines for the bAbI, SQuAD, CBT, CNN, and Who-did-What datasets, finding that question- and passage-only models often perform surprisingly well. On $14$ out of $20$ bAbI tasks, passage-only models achieve greater than $50%$ accuracy, sometimes matching the full model. Interestingly, while CBT provides $20$ -sentence stories only the last is needed for comparably accurate prediction. By comparison, SQuAD and CNN appear…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.