Improving Question Answering with External Knowledge
Xiaoman Pan, Kai Sun, Dian Yu, Jianshu Chen, Heng Ji, Claire Cardie,, Dong Yu

TL;DR
This paper investigates leveraging external knowledge sources, like Wikipedia and additional training data, to improve multiple-choice science question answering, achieving significant accuracy gains but also revealing limitations in data difficulty levels.
Contribution
It introduces simple methods for incorporating external knowledge into subject-area QA and provides empirical analysis of their effectiveness and limitations.
Findings
Wikipedia-based knowledge enrichment improves accuracy significantly.
Adding more training instances can sometimes degrade performance.
External knowledge integration shows promise but has limitations depending on data difficulty.
Abstract
We focus on multiple-choice question answering (QA) tasks in subject areas such as science, where we require both broad background knowledge and the facts from the given subject-area reference corpus. In this work, we explore simple yet effective methods for exploiting two sources of external knowledge for subject-area QA. The first enriches the original subject-area reference corpus with relevant text snippets extracted from an open-domain resource (i.e., Wikipedia) that cover potentially ambiguous concepts in the question and answer options. As in other QA research, the second method simply increases the amount of training data by appending additional in-domain subject-area instances. Experiments on three challenging multiple-choice science QA tasks (i.e., ARC-Easy, ARC-Challenge, and OpenBookQA) demonstrate the effectiveness of our methods: in comparison to the previous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
