IIRC: A Dataset of Incomplete Information Reading Comprehension   Questions

James Ferguson; Matt Gardner; Hannaneh Hajishirzi; Tushar Khot,; Pradeep Dasigi

arXiv:2011.07127·cs.CL·November 17, 2020·1 cites

IIRC: A Dataset of Incomplete Information Reading Comprehension Questions

James Ferguson, Matt Gardner, Hannaneh Hajishirzi, Tushar Khot,, Pradeep Dasigi

PDF

Open Access

TL;DR

The paper introduces IIRC, a challenging dataset of over 13,000 incomplete information reading comprehension questions based on Wikipedia, designed to evaluate systems' ability to identify and locate missing information across linked documents.

Contribution

It presents a novel dataset with incomplete questions requiring reasoning over multiple documents, filling a gap in existing reading comprehension benchmarks.

Findings

01

Baseline model achieves 31.1% F1 score.

02

Human performance estimated at 88.4%.

03

Questions often lack lexical overlap with contexts.

Abstract

Humans often have to read multiple documents to address their information needs. However, most existing reading comprehension (RC) tasks only focus on questions for which the contexts provide all the information required to answer them, thus not evaluating a system's performance at identifying a potential lack of sufficient information and locating sources for that information. To fill this gap, we present a dataset, IIRC, with more than 13K questions over paragraphs from English Wikipedia that provide only partial information to answer them, with the missing information occurring in one or more linked documents. The questions were written by crowd workers who did not have access to any of the linked documents, leading to questions that have little lexical overlap with the contexts where the answers appear. This process also gave many questions without answers, and those that require…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification