A Dataset of Information-Seeking Questions and Answers Anchored in Research Papers
Pradeep Dasigi, Kyle Lo, Iz Beltagy, Arman Cohan, Noah A. Smith, Matt, Gardner

TL;DR
This paper introduces QASPER, a dataset of 5,049 questions about research papers designed to improve question answering systems' ability to handle complex, document-grounded inquiries in academic contexts.
Contribution
The paper presents QASPER, a novel dataset with real questions from NLP practitioners, focusing on complex reasoning over full papers, which is lacking in existing QA datasets.
Findings
Existing models underperform humans by at least 27 F1 points on this dataset.
Questions require complex reasoning about claims in multiple parts of papers.
The dataset highlights the need for advanced document-grounded QA systems.
Abstract
Readers of academic research papers often read with the goal of answering specific questions. Question Answering systems that can answer those questions can make consumption of the content much more efficient. However, building such tools requires data that reflect the difficulty of the task arising from complex reasoning about claims made in multiple parts of a paper. In contrast, existing information-seeking question answering datasets usually contain questions about generic factoid-type information. We therefore present QASPER, a dataset of 5,049 questions over 1,585 Natural Language Processing papers. Each question is written by an NLP practitioner who read only the title and abstract of the corresponding paper, and the question seeks information present in the full text. The questions are then answered by a separate set of NLP practitioners who also provide supporting evidence to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
