Loop, Think, & Generalize: Implicit Reasoning in Recurrent-Depth Transformers

Harsh Kohli; Srinivasan Parthasarathy; Huan Sun; Yuekun Yao

arXiv:2604.07822·cs.CL·April 10, 2026

Loop, Think, & Generalize: Implicit Reasoning in Recurrent-Depth Transformers

Harsh Kohli, Srinivasan Parthasarathy, Huan Sun, Yuekun Yao

PDF

2 Models

TL;DR

This paper explores how recurrent-depth transformers improve implicit multi-hop reasoning and generalization, enabling models to compose knowledge beyond training depths through iterative computation.

Contribution

It introduces recurrent-depth transformers that enhance compositional generalization and depth extrapolation, addressing limitations of vanilla transformers in implicit reasoning tasks.

Findings

01

Recurrent-depth transformers outperform vanilla transformers in systematic generalization.

02

Scaling inference-time recurrence improves depth extrapolation capabilities.

03

Training strategies influence the effectiveness of depth extrapolation and reveal overthinking limitations.

Abstract

We study implicit reasoning, i.e. the ability to combine knowledge or rules within a single forward pass. While transformer-based large language models store substantial factual knowledge and rules, they often fail to compose this knowledge for implicit multi-hop reasoning, suggesting a lack of compositional generalization over their parametric knowledge. To address this limitation, we study recurrent-depth transformers, which enables iterative computation over the same transformer layers. We investigate two compositional generalization challenges under the implicit reasoning scenario: systematic generalization, i.e. combining knowledge that is never used for compositions during training, and depth extrapolation, i.e. generalizing from limited reasoning depth (e.g. training on up to 5-hop) to deeper compositions (e.g. 10-hop). Through controlled studies with models trained from scratch,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.