Structured World Representations in Maze-Solving Transformers
Michael Igorevich Ivanitskiy, Alex F. Spies, Tilman R\"auker,, Guillaume Corlouer, Chris Mathwin, Lucia Quirke, Can Rager, Rusheb Shah, Dan, Valentine, Cecilia Diniz Behn, Katsumi Inoue, Samy Wu Fung

TL;DR
This paper investigates small transformer models solving mazes, revealing structured internal representations of maze topology, path information, and attention mechanisms, which enhances understanding of how transformers process spatial tasks.
Contribution
It demonstrates that maze topology and paths are internally represented in a structured manner within small transformers, including linearly decodable maze reconstructions and specialized attention heads.
Findings
Residual stream encodes entire maze topology
Token embeddings exhibit spatial structure
Attention heads are involved in path-following
Abstract
Transformer models underpin many recent advances in practical machine learning applications, yet understanding their internal behavior continues to elude researchers. Given the size and complexity of these models, forming a comprehensive picture of their inner workings remains a significant challenge. To this end, we set out to understand small transformer models in a more tractable setting: that of solving mazes. In this work, we focus on the abstractions formed by these models and find evidence for the consistent emergence of structured internal representations of maze topology and valid paths. We demonstrate this by showing that the residual stream of only a single token can be linearly decoded to faithfully reconstruct the entire maze. We also find that the learned embeddings of individual tokens have spatial structure. Furthermore, we take steps towards deciphering the circuity of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Neural dynamics and brain function · Cell Image Analysis Techniques
MethodsSparse Evolutionary Training · Focus
