A Systematic Classification of Knowledge, Reasoning, and Context within   the ARC Dataset

Michael Boratko; Harshit Padigela; Divyendra Mikkilineni; Pritish; Yuvraj; Rajarshi Das; Andrew McCallum; Maria Chang; Achille Fokoue-Nkoutche,; Pavan Kapanipathi; Nicholas Mattei; Ryan Musa; Kartik Talamadupula; Michael; Witbrock

arXiv:1806.00358·cs.AI·February 6, 2019

A Systematic Classification of Knowledge, Reasoning, and Context within the ARC Dataset

Michael Boratko, Harshit Padigela, Divyendra Mikkilineni, Pritish, Yuvraj, Rajarshi Das, Andrew McCallum, Maria Chang, Achille Fokoue-Nkoutche,, Pavan Kapanipathi, Nicholas Mattei, Ryan Musa, Kartik Talamadupula, Michael, Witbrock

PDF

TL;DR

This paper provides a detailed classification of knowledge and reasoning types in the ARC dataset, analyzes label distribution, and shows that human-selected supporting sentences significantly improve neural comprehension models.

Contribution

It introduces a comprehensive framework for classifying knowledge and reasoning in science questions and evaluates the impact of relevant supporting text on model performance.

Findings

01

Label distribution varies across the Challenge Set.

02

Naive retrieval often finds irrelevant sentences.

03

Human-selected support improves model accuracy by 42 points.

Abstract

The recent work of Clark et al. introduces the AI2 Reasoning Challenge (ARC) and the associated ARC dataset that partitions open domain, complex science questions into an Easy Set and a Challenge Set. That paper includes an analysis of 100 questions with respect to the types of knowledge and reasoning required to answer them; however, it does not include clear definitions of these types, nor does it offer information about the quality of the labels. We propose a comprehensive set of definitions of knowledge and reasoning types necessary for answering the questions in the ARC dataset. Using ten annotators and a sophisticated annotation interface, we analyze the distribution of labels across the Challenge Set and statistics related to them. Additionally, we demonstrate that although naive information retrieval methods return sentences that are irrelevant to answering the query, sufficient…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.