Seemingly Plausible Distractors in Multi-Hop Reasoning: Are Large   Language Models Attentive Readers?

Neeladri Bhuiya; Viktor Schlegel; Stefan Winkler

arXiv:2409.05197·cs.CL·November 1, 2024

Seemingly Plausible Distractors in Multi-Hop Reasoning: Are Large Language Models Attentive Readers?

Neeladri Bhuiya, Viktor Schlegel, Stefan Winkler

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper investigates whether large language models genuinely perform multi-hop reasoning or exploit superficial cues, revealing their vulnerabilities to plausible yet incorrect reasoning chains and proposing a challenging benchmark.

Contribution

The study uncovers subtle ways LLMs bypass multi-hop reasoning and introduces a new benchmark with plausible distractors to evaluate their reasoning capabilities.

Findings

01

LLMs' performance drops up to 45% with plausible distractors

02

Models tend to ignore lexical cues but struggle with misleading reasoning paths

03

Proposed benchmark reveals vulnerabilities in current LLM reasoning abilities

Abstract

State-of-the-art Large Language Models (LLMs) are accredited with an increasing number of different capabilities, ranging from reading comprehension, over advanced mathematical and reasoning skills to possessing scientific knowledge. In this paper we focus on their multi-hop reasoning capability: the ability to identify and integrate information from multiple textual sources. Given the concerns with the presence of simplifying cues in existing multi-hop reasoning benchmarks, which allow models to circumvent the reasoning requirement, we set out to investigate, whether LLMs are prone to exploiting such simplifying cues. We find evidence that they indeed circumvent the requirement to perform multi-hop reasoning, but they do so in more subtle ways than what was reported about their fine-tuned pre-trained language model (PLM) predecessors. Motivated by this finding, we propose a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zawedcvg/are-large-language-models-attentive-readers
noneOfficial

Videos

Seemingly Plausible Distractors in Multi-Hop Reasoning: Are Large Language Models Attentive Readers?· underline

Taxonomy

TopicsTopic Modeling

MethodsSparse Evolutionary Training · Focus