Language models can learn implicit multi-hop reasoning, but only if they have lots of training data

Yuekun Yao; Yupei Du; Dawei Zhu; Michael Hahn; Alexander Koller

arXiv:2505.17923·cs.CL·February 5, 2026

Language models can learn implicit multi-hop reasoning, but only if they have lots of training data

Yuekun Yao, Yupei Du, Dawei Zhu, Michael Hahn, Alexander Koller

PDF

TL;DR

This paper explores how GPT2-style models can learn implicit multi-hop reasoning, revealing that training data and model depth requirements grow exponentially and linearly with reasoning complexity, respectively, and that curriculum learning can help.

Contribution

It demonstrates the data and depth scaling laws for implicit multi-hop reasoning in language models and provides a theoretical explanation for these phenomena.

Findings

01

Training data grows exponentially with reasoning steps

02

Number of transformer layers grows linearly with reasoning steps

03

Curriculum learning reduces data requirements

Abstract

Implicit reasoning is the ability of a language model to solve multi-hop reasoning tasks in a single forward pass, without chain of thought. We investigate this capability using GPT2-style language models trained from scratch on controlled $k$ -hop reasoning datasets ( $k = 2, 3, 4$ ). We show that while such models can indeed learn implicit $k$ -hop reasoning, the required training data grows exponentially in $k$ , and the required number of transformer layers grows linearly in $k$ . We offer a theoretical explanation for why this depth growth is necessary. We further find that the data requirement can be mitigated, but not eliminated, through curriculum learning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.