Transformers Linearly Represent Highly Structured World Models

Roman Kniazev; Nathana\"el Fijalkow

arXiv:2605.18847·cs.LG·May 20, 2026

Transformers Linearly Represent Highly Structured World Models

Roman Kniazev, Nathana\"el Fijalkow

PDF

TL;DR

This paper demonstrates that transformers trained on Sudoku traces develop internal, structured world models aligned with domain constraints, and identify a sparse, interpretable decision circuit for solving the puzzle.

Contribution

The study reveals that transformers build structured internal representations reflecting domain constraints and identifies a specific, interpretable circuit for decision-making.

Findings

01

Transformers organize information around Sudoku constraints rather than individual cells.

02

A dedicated neuron circuit detects when only one digit remains possible for a cell.

03

The internal model's geometry is shaped by the domain's algebraic structure.

Abstract

Do transformers, when trained on sequential reasoning traces, build internal models of the underlying task? And if so, does the structure of those internal representations mirror the structure of the domain? We train an 8-layer transformer on Sudoku solving traces and perform a mechanistic analysis of its internal computation. We establish two results. First, the model builds a substructure world model: it does not represent the board state cell by cell, as a human analyst would expect, but organizes information around the rows, columns, and boxes that Sudoku's constraints act on. Second, we identify a naked-single circuit: a small set of dedicated neurons in the final MLP layer, each individually detecting when exactly one digit remains possible for a specific cell, and reliably promoting that digit. These findings show that the geometry of an emergent world model is shaped by the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.