Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish, Sabharwal, Carissa Schoenick, Oyvind Tafjord

TL;DR
The paper introduces the AI2 Reasoning Challenge (ARC), a large dataset of grade-school science questions designed to push AI systems towards advanced reasoning and knowledge understanding, highlighting current models' limitations.
Contribution
The paper presents the ARC dataset, including Challenge and Easy sets, along with a large corpus and baseline models, to foster progress in complex question answering.
Findings
Current neural models perform no better than random on the Challenge Set.
ARC is the largest public dataset of its kind for science questions.
Baseline models struggle with advanced reasoning required by ARC.
Abstract
We present a new question set, text corpus, and baselines assembled to encourage AI research in advanced question answering. Together, these constitute the AI2 Reasoning Challenge (ARC), which requires far more powerful knowledge and reasoning than previous challenges such as SQuAD or SNLI. The ARC question set is partitioned into a Challenge Set and an Easy Set, where the Challenge Set contains only questions answered incorrectly by both a retrieval-based algorithm and a word co-occurence algorithm. The dataset contains only natural, grade-school science questions (authored for human tests), and is the largest public-domain set of this kind (7,787 questions). We test several baselines on the Challenge Set, including leading neural models from the SQuAD and SNLI tasks, and find that none are able to significantly outperform a random baseline, reflecting the difficult nature of this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗nicholasKluge/Aira-2-124Mmodel· 340 dl· ♡ 1340 dl♡ 1
- 🤗nicholasKluge/Aira-2-355Mmodel· 343 dl· ♡ 2343 dl♡ 2
- 🤗nicholasKluge/Aira-2-774Mmodel· 350 dl· ♡ 3350 dl♡ 3
- 🤗nicholasKluge/Aira-2-portuguese-124Mmodel· 39 dl· ♡ 339 dl♡ 3
- 🤗nicholasKluge/Aira-2-1B5model· 328 dl· ♡ 1328 dl♡ 1
- 🤗MayaPH/GodziLLa-30Bmodel· 844 dl· ♡ 10844 dl♡ 10
- 🤗MayaPH/GodziLLa2-70Bmodel· 929 dl· ♡ 38929 dl♡ 38
- 🤗TheBloke/GodziLLa2-70B-GGMLmodel· 9 dl· ♡ 99 dl♡ 9
- 🤗TheBloke/GodziLLa2-70B-GPTQmodel· 13 dl· ♡ 413 dl♡ 4
- 🤗TheBloke/GodziLLa2-70B-GGUFmodel· 189 dl· ♡ 10189 dl♡ 10
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Logic, Reasoning, and Knowledge · Natural Language Processing Techniques
