The Topological Trouble With Transformers
Michael C. Mozer, Shoaib Ahmed Siddiqui, Rosanne Liu

TL;DR
This paper discusses the limitations of feedforward transformers in dynamic state tracking and advocates for recurrent architectures to improve temporal cognition in models.
Contribution
It introduces a taxonomy of recurrent transformer architectures and suggests research directions for better state integration in foundation models.
Findings
Feedforward transformers struggle with dynamic state tracking due to depth limitations.
Recurrent architectures can better support temporal cognition by maintaining implicit activation dynamics.
The paper outlines a taxonomy categorizing recurrent transformer designs by recurrence axis and input-to-recurrence ratio.
Abstract
Transformers encode structure in sequences via an expanding contextual history. However, their purely feedforward architecture fundamentally limits dynamic state tracking. State tracking -- the iterative updating of latent variables reflecting an evolving environment -- involves inherently sequential dependencies that feedforward networks struggle to maintain. Consequently, feedforward models push evolving state representations deeper into their layer stack with each new input step, rendering information inaccessible in shallow layers and ultimately exhausting the model's depth. While this depth limit can be bypassed by dynamic depth models and by explicit or latent thinking that externalizes state representations, these solutions are computationally and memory inefficient. In this article, we argue that temporally extended cognition requires refocusing from explicit thought traces to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
