NewsQA: A Machine Comprehension Dataset
Adam Trischler, Tong Wang, Xingdi Yuan, Justin Harris, Alessandro, Sordoni, Philip Bachman, Kaheer Suleman

TL;DR
NewsQA is a large, challenging machine comprehension dataset based on news articles, designed to require reasoning beyond simple pattern matching, with a significant gap between human and machine performance.
Contribution
The paper introduces NewsQA, a new dataset for machine comprehension that emphasizes reasoning and exploration, filling a gap in existing datasets.
Findings
Humans outperform neural models significantly on NewsQA.
The dataset requires reasoning beyond word matching.
There is substantial room for improvement in machine comprehension.
Abstract
We present NewsQA, a challenging machine comprehension dataset of over 100,000 human-generated question-answer pairs. Crowdworkers supply questions and answers based on a set of over 10,000 news articles from CNN, with answers consisting of spans of text from the corresponding articles. We collect this dataset through a four-stage process designed to solicit exploratory questions that require reasoning. A thorough analysis confirms that NewsQA demands abilities beyond simple word matching and recognizing textual entailment. We measure human performance on the dataset and compare it to several strong neural models. The performance gap between humans and machines (0.198 in F1) indicates that significant progress can be made on NewsQA through future research. The dataset is freely available at https://datasets.maluuba.com/NewsQA.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
