MetaOthello: A Controlled Study of Multiple World Models in Transformers

Aviral Chawla; Galen Hall; Juniper Lovato

arXiv:2602.23164·cs.LG·February 27, 2026

MetaOthello: A Controlled Study of Multiple World Models in Transformers

Aviral Chawla, Galen Hall, Juniper Lovato

PDF

Open Access

TL;DR

MetaOthello investigates how transformer models organize multiple, potentially conflicting world models within a shared representation space using a suite of Othello variants, revealing shared and specialized internal representations.

Contribution

Introduces MetaOthello, a controlled benchmark for studying multiple world models in transformers, and provides insights into their shared and layered organization across variants.

Findings

01

Transformers trained on multiple variants develop shared state representations.

02

Linear probes can transfer causally across variants, indicating shared internal states.

03

Representations are equivalent up to orthogonal rotations for isomorphic games.

Abstract

Foundation models must handle multiple generative processes, yet mechanistic interpretability largely studies capabilities in isolation; it remains unclear how a single transformer organizes multiple, potentially conflicting "world models". Previous experiments on Othello playing neural-networks test world-model learning but focus on a single game with a single set of rules. We introduce MetaOthello, a controlled suite of Othello variants with shared syntax but different rules or tokenizations, and train small GPTs on mixed-variant data to study how multiple world models are organized in a shared representation space. We find that transformers trained on mixed-game data do not partition their capacity into isolated sub-models; instead, they converge on a mostly shared board-state representation that transfers causally across variants. Linear probes trained on one variant can intervene…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Games · Generative Adversarial Networks and Image Synthesis · Human Motion and Animation