MoreHopQA: More Than Multi-hop Reasoning
Julian Schnitzler, Xanh Ho, Jiahao Huang, Florian Boudin, Saku, Sugawara, Akiko Aizawa

TL;DR
MoreHopQA introduces a challenging multi-hop question dataset with generative answers, combining various reasoning types to better evaluate large language models' true multi-hop reasoning capabilities.
Contribution
The paper presents a new dataset, MoreHopQA, that extends existing multi-hop questions with additional reasoning layers, and evaluates large language models on this more complex benchmark.
Findings
Models perform well on initial questions but struggle with extended reasoning.
Only about 34-39% of answers are fully correct with all sub-questions answered correctly.
The dataset is more challenging than previous multi-hop datasets.
Abstract
Most existing multi-hop datasets are extractive answer datasets, where the answers to the questions can be extracted directly from the provided context. This often leads models to use heuristics or shortcuts instead of performing true multi-hop reasoning. In this paper, we propose a new multi-hop dataset, MoreHopQA, which shifts from extractive to generative answers. Our dataset is created by utilizing three existing multi-hop datasets: HotpotQA, 2WikiMultihopQA, and MuSiQue. Instead of relying solely on factual reasoning, we enhance the existing multi-hop questions by adding another layer of questioning that involves one, two, or all three of the following types of reasoning: commonsense, arithmetic, and symbolic. Our dataset is created through a semi-automated process, resulting in a dataset with 1,118 samples that have undergone human verification. We then use our dataset to evaluate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBusiness Process Modeling and Analysis · AI-based Problem Solving and Planning · Semantic Web and Ontologies
MethodsAttention Is All You Need · Softmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Position-Wise Feed-Forward Layer · Dropout · Adam · Linear Layer · Absolute Position Encodings
