Better World Models Can Lead to Better Post-Training Performance
Prakhar Gupta, Henry Conklin, Sarah-Jane Leslie, Andrew Lee

TL;DR
This paper investigates how explicit world-modeling objectives during training enhance the internal representations of Transformers and improve their post-training performance on complex tasks like solving a Rubik's Cube.
Contribution
It demonstrates that explicit world-model pretraining improves internal representations and post-training performance, especially on difficult tasks, compared to standard next-token prediction.
Findings
Explicit world-modeling yields more decodable state representations.
Better representations lead to higher gains in post-training performance.
Improved state representations particularly benefit harder cube states.
Abstract
In this work we study how explicit world-modeling objectives affect the internal representations and downstream capability of Transformers across different training stages. We use a controlled 2x2x2 Rubik's Cube and ask: (1) how does explicitly pretraining a world model affect the model's latent representations, and (2) how does world-model quality affect the model's performance after reinforcement learning post-training? We compare standard next-token prediction to two explicit world-modeling strategies -- (i) state-prediction pretraining and (ii) a joint state-prediction + next-token objective -- and assess task performance after Group Relative Policy Optimization (GRPO) is applied as post-training. We evaluate the representation quality with linear probes and causal interventions. We find that explicit world-modeling yields more linearly decodable and causally steerable state…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Artificial Intelligence in Games
