Stability and Generalization in Looped Transformers
Asher Labovich

TL;DR
This paper introduces a fixed-point framework to analyze the stability and generalization of looped transformers, demonstrating how architectural choices affect their ability to extrapolate to harder problems.
Contribution
It provides a theoretical analysis of looped transformer stability and introduces internal recall, showing how normalization influences their extrapolation capabilities.
Findings
Looped networks without recall have countable fixed points and limited input dependence.
Recall with outer normalization enables stable, input-smooth fixed points.
Internal recall with normalization improves performance on tasks like sudoku.
Abstract
Looped transformers promise test-time compute scaling by spending more iterations on harder problems, but it remains unclear which architectural choices let them extrapolate to harder problems at test time rather than memorize training-specific solutions. We introduce a fixed-point based framework for analyzing looped architectures along three axes of stability -- reachability, input-dependence, and geometry -- and use it to characterize when fixed-point iteration yields meaningful predictions. Theoretically, we prove that looped networks without recall have countable fixed points and cannot achieve strong input-dependence at any spectral regime, while recall combined with outer normalization reliably produces a regime in which fixed points are simultaneously reachable, locally smooth in the input, and supported by stable backpropagation. Empirically, we train single-layer looped…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
