Thinking Deeper, Not Longer: Depth-Recurrent Transformers for Compositional Generalization
Hung-Hsuan Chen

TL;DR
This paper introduces a depth-recurrent Transformer architecture that enables variable-depth reasoning by iteratively applying shared weights, improving compositional generalization across diverse tasks.
Contribution
The authors propose a novel depth-recurrent Transformer with mechanisms for stable deep recurrence, enabling reasoning beyond fixed-depth models and providing insights into OOD generalization.
Findings
Achieves near-perfect performance with increased reasoning steps
Demonstrates different generalization behaviors across tasks
Reveals a computational frontier where performance sharply improves
Abstract
Standard Transformers have a fixed computational depth, fundamentally limiting their ability to generalize to tasks requiring variable-depth reasoning, such as multi-hop graph traversal or nested logic. We propose a depth-recurrent Transformer that decouples computational depth from parameter count by iteratively applying a shared-weight Transformer block in latent space -- enabling the model to trade recurrence steps for deeper reasoning at inference time. Our architecture incorporates three mechanisms to make deep recurrence (20+ steps) stable: (1) a silent thinking objective that supervises only the final output, forcing genuine multi-step reasoning rather than intermediate heuristic shortcuts; (2) LayerScale initialization to protect fragile reasoning states from untrained layer noise; and (3) an identity-biased recurrence that creates a gradient highway across many steps. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Multimodal Machine Learning Applications · Data Visualization and Analytics
