Executable World Models for ARC-AGI-3 in the Era of Coding Agents
Sergey Rodionov

TL;DR
This paper presents a coding-agent system with an executable Python world model that verifies, refactors, and plans before acting, demonstrating promising results on ARC-AGI-3 games without game-specific code.
Contribution
It introduces a verifier-driven executable world model approach for ARC-AGI-3, serving as a game-general baseline with no hand-coded game logic.
Findings
Solved 7 out of 25 ARC-AGI-3 games.
Achieved over 75% relative human action efficiency on 6 games.
Mean per-game RHAE of 32.58%.
Abstract
We evaluate an initial coding-agent system for ARC-AGI-3 in which the agent maintains an executable Python world model, verifies it against previous observations, refactors it toward simpler abstractions as a practical proxy for an MDL-like simplicity bias, and plans through the model before acting. The system is intentionally direct: it uses a scripted controller, predefined world-model interfaces, verifier programs, and a plan executor, but no hand-coded game-specific logic. We report results on the 25 public ARC-AGI-3 games. Each recorded playthrough uses a fresh agent instance with no access to previous playthrough-specific files or conversation state. Most games have a single recorded playthrough; for a few games, we report multiple independent fresh-agent playthroughs to expose run-to-run variability. The agent fully solved 7 games, achieved a Relative Human Action Efficiency…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
