Analyzing the Effectiveness of the Underlying Reasoning Tasks in Multi-hop Question Answering
Xanh Ho, Anh-Khoa Duong Nguyen, Saku Sugawara, and Akiko Aizawa

TL;DR
This paper investigates how underlying reasoning tasks influence multi-hop question answering, showing they improve performance and reduce reasoning shortcuts but do not enhance robustness against adversarial questions.
Contribution
It introduces a multi-task model incorporating sentence-level and entity-level reasoning tasks and provides comprehensive analysis on their effects in multi-hop QA.
Findings
UR tasks improve QA performance
UR tasks help prevent reasoning shortcuts
UR tasks do not enhance robustness to adversarial questions
Abstract
To explain the predicted answers and evaluate the reasoning abilities of models, several studies have utilized underlying reasoning (UR) tasks in multi-hop question answering (QA) datasets. However, it remains an open question as to how effective UR tasks are for the QA task when training models on both tasks in an end-to-end manner. In this study, we address this question by analyzing the effectiveness of UR tasks (including both sentence-level and entity-level tasks) in three aspects: (1) QA performance, (2) reasoning shortcuts, and (3) robustness. While the previous models have not been explicitly trained on an entity-level reasoning prediction task, we build a multi-task model that performs three tasks together: sentence-level supporting facts prediction, entity-level reasoning prediction, and answer prediction. Experimental results on 2WikiMultiHopQA and HotpotQA-small datasets…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
