Executable World Models for ARC-AGI-3 in the Era of Coding Agents

Sergey Rodionov

arXiv:2605.05138·cs.AI·May 7, 2026

Executable World Models for ARC-AGI-3 in the Era of Coding Agents

Sergey Rodionov

PDF

TL;DR

This paper presents a coding-agent system with an executable Python world model that verifies, refactors, and plans before acting, demonstrating promising results on ARC-AGI-3 games without game-specific code.

Contribution

It introduces a verifier-driven executable world model approach for ARC-AGI-3, serving as a game-general baseline with no hand-coded game logic.

Findings

01

Solved 7 out of 25 ARC-AGI-3 games.

02

Achieved over 75% relative human action efficiency on 6 games.

03

Mean per-game RHAE of 32.58%.

Abstract

We evaluate an initial coding-agent system for ARC-AGI-3 in which the agent maintains an executable Python world model, verifies it against previous observations, refactors it toward simpler abstractions as a practical proxy for an MDL-like simplicity bias, and plans through the model before acting. The system is intentionally direct: it uses a scripted controller, predefined world-model interfaces, verifier programs, and a plan executor, but no hand-coded game-specific logic. We report results on the 25 public ARC-AGI-3 games. Each recorded playthrough uses a fresh agent instance with no access to previous playthrough-specific files or conversation state. Most games have a single recorded playthrough; for a few games, we report multiple independent fresh-agent playthroughs to expose run-to-run variability. The agent fully solved 7 games, achieved a Relative Human Action Efficiency…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.