Lessons from Studying Two-Hop Latent Reasoning

Mikita Balesni; Tomek Korbak; Owain Evans

arXiv:2411.16353·cs.CL·November 25, 2025

Lessons from Studying Two-Hop Latent Reasoning

Mikita Balesni, Tomek Korbak, Owain Evans

PDF

Open Access

TL;DR

This paper investigates the latent two-hop reasoning capabilities of large language models using a controlled synthetic fact setting, revealing nuanced strengths and limitations in their reasoning abilities.

Contribution

It introduces a controlled experimental framework to definitively assess latent two-hop reasoning in LLMs, highlighting the importance of experimental design in evaluating reasoning capabilities.

Findings

01

Models can perform two-hop reasoning when combining synthetic and natural facts.

02

Performance varies depending on the nature of facts involved, indicating nuanced reasoning abilities.

03

Results caution against misinterpreting successes or failures due to memorization or experimental artifacts.

Abstract

Large language models can use chain-of-thought (CoT) to externalize reasoning, potentially enabling oversight of capable LLM agents. Prior work has shown that models struggle at two-hop question-answering without CoT. This capability is so basic that if it was a fundamental limitation, it would imply that many complex agentic tasks would similarly require CoT. We investigate LLM latent reasoning capabilities using two-hop question answering as a case study. Previous work on the gap between latent and externalized two-hop reasoning produced mixed evidence with inconclusive results. In this paper, we introduce a controlled setting for investigating two-hop reasoning in LLMs, where a positive result provides definitive evidence for latent reasoning. We fine-tune LLMs (including Llama 3 8B and GPT-4o) on synthetic facts and test two-hop reasoning over these facts. By using synthetic facts,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law

MethodsFast Attention Via Positive Orthogonal Random Features · LLaMA · Performer