TL;DR
This paper explores how recurrent-depth transformers improve implicit multi-hop reasoning and generalization, enabling models to compose knowledge beyond training depths through iterative computation.
Contribution
It introduces recurrent-depth transformers that enhance compositional generalization and depth extrapolation, addressing limitations of vanilla transformers in implicit reasoning tasks.
Findings
Recurrent-depth transformers outperform vanilla transformers in systematic generalization.
Scaling inference-time recurrence improves depth extrapolation capabilities.
Training strategies influence the effectiveness of depth extrapolation and reveal overthinking limitations.
Abstract
We study implicit reasoning, i.e. the ability to combine knowledge or rules within a single forward pass. While transformer-based large language models store substantial factual knowledge and rules, they often fail to compose this knowledge for implicit multi-hop reasoning, suggesting a lack of compositional generalization over their parametric knowledge. To address this limitation, we study recurrent-depth transformers, which enables iterative computation over the same transformer layers. We investigate two compositional generalization challenges under the implicit reasoning scenario: systematic generalization, i.e. combining knowledge that is never used for compositions during training, and depth extrapolation, i.e. generalizing from limited reasoning depth (e.g. training on up to 5-hop) to deeper compositions (e.g. 10-hop). Through controlled studies with models trained from scratch,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
