How Do LLMs Perform Two-Hop Reasoning in Context?

Tianyu Guo; Hanlin Zhu; Ruiqi Zhang; Jiantao Jiao; Song Mei; Michael I. Jordan; Stuart Russell

arXiv:2502.13913·cs.CL·May 29, 2025

How Do LLMs Perform Two-Hop Reasoning in Context?

Tianyu Guo, Hanlin Zhu, Ruiqi Zhang, Jiantao Jiao, Song Mei, Michael I. Jordan, Stuart Russell

PDF

Open Access

TL;DR

This paper investigates how large language models perform two-hop reasoning, revealing their initial failures, subsequent improvements through fine-tuning, and the internal mechanisms that emerge during training, including a structured attention process.

Contribution

It demonstrates the failure modes of LLMs on two-hop reasoning tasks, shows how fine-tuning improves performance, and uncovers the internal attention dynamics through training and minimal models.

Findings

01

Pre-trained LLMs often guess randomly on two-hop reasoning with distractors.

02

Fine-tuning leads to near-perfect accuracy and length generalization.

03

Structured attention mechanisms emerge during training, enabling reasoning.

Abstract

``Socrates is human. All humans are mortal. Therefore, Socrates is mortal.'' This form of argument illustrates a typical pattern of two-hop reasoning. Formally, two-hop reasoning refers to the process of inferring a conclusion by making two logical steps, each connecting adjacent concepts, such that the final conclusion depends on the integration of both steps. It is one of the most fundamental components of human reasoning and plays a crucial role in both formal logic and everyday decision-making. Despite recent progress in large language models (LLMs), we surprisingly find that they can fail at solving simple two-hop reasoning problems when distractors are present. We observe on a synthetic dataset that pre-trained LLMs often resort to random guessing among all plausible conclusions. However, after few steps of fine-tuning, models achieve near-perfect accuracy and exhibit strong…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Rights Management and Security · Artificial Intelligence in Law