A Mechanistic Analysis of Looped Reasoning Language Models
Hugh Blayney, \'Alvaro Arroyo, Johan Obando-Ceron, Pablo Samuel Castro, Aaron Courville, Michael M. Bronstein, Xiaowen Dong

TL;DR
This paper provides a mechanistic analysis of looped reasoning language models, revealing how their internal dynamics and fixed points differ from standard models, and offering insights for architectural improvements.
Contribution
It investigates the internal fixed points and cyclic trajectories of looped language models, connecting their dynamics to stages of inference similar to feedforward models.
Findings
Layer cycles converge to fixed points, stabilizing attention-head behavior.
Recurrent blocks learn inference stages mirroring feedforward models.
Model parameters like size and normalization affect fixed point stability.
Abstract
Reasoning has become a central capability in large language models. Recent research has shown that reasoning performance can be improved by looping an LLM's layers in the latent dimension, resulting in looped reasoning language models. Despite promising results, few works have investigated how their internal dynamics differ from those of standard feedforward models. In this paper, we conduct a mechanistic analysis of the latent states in looped language models, focusing in particular on how the stages of inference observed in feedforward models compare to those observed in looped ones. To this end, we analyze cyclic recurrence and show that for many of the studied models each layer in the cycle converges to a distinct fixed point; consequently, the recurrent block follows a consistent cyclic trajectory in the latent space. We provide evidence that as these fixed points are reached,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
