How Much Reading Does Reading Comprehension Require? A Critical Investigation of Popular Benchmarks
Divyansh Kaushik, Zachary C. Lipton

TL;DR
This paper critically examines popular reading comprehension benchmarks, revealing that simple models often perform surprisingly well and questioning the true difficulty and necessity of combining question and passage information.
Contribution
It establishes baseline performances for several datasets and highlights that many benchmarks may not require complex reasoning, challenging assumptions about their difficulty.
Findings
Question-only models perform well on many datasets.
Passage-only models achieve high accuracy on several tasks.
Last sentence in stories suffices for accurate predictions in CBT.
Abstract
Many recent papers address reading comprehension, where examples consist of (question, passage, answer) tuples. Presumably, a model must combine information from both questions and passages to predict corresponding answers. However, despite intense interest in the topic, with hundreds of published papers vying for leaderboard dominance, basic questions about the difficulty of many popular benchmarks remain unanswered. In this paper, we establish sensible baselines for the bAbI, SQuAD, CBT, CNN, and Who-did-What datasets, finding that question- and passage-only models often perform surprisingly well. On out of bAbI tasks, passage-only models achieve greater than accuracy, sometimes matching the full model. Interestingly, while CBT provides -sentence stories only the last is needed for comparably accurate prediction. By comparison, SQuAD and CNN appear…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
