Debugging code world models

Babak Rahmani

arXiv:2602.07672·cs.SE·February 17, 2026

Debugging code world models

Babak Rahmani

PDF

Open Access

TL;DR

This paper investigates Code World Models (CWMs), revealing their failure modes related to token budget exhaustion and string state limitations, and shows that improving action accuracy can enhance long-horizon state tracking.

Contribution

It provides a detailed analysis of CWMs' failure regimes and demonstrates that correct action generation significantly improves long-term state propagation.

Findings

01

Token budget exhaustion limits long execution traces.

02

String state limitations stem from subword tokenization.

03

Correct action replacement improves long-horizon accuracy.

Abstract

Code World Models (CWMs) are language models trained to simulate program execution by predicting explicit runtime state after every executed command. This execution-based world modeling enables internal verification within the model, offering an alternative to natural language chain-of-thought reasoning. However, the sources of errors and the nature of CWMs' limitations remain poorly understood. We study CWMs from two complementary perspectives: local semantic execution and long-horizon state tracking. On real-code benchmarks, we identify two dominant failure regimes. First, dense runtime state reveals produce token-intensive execution traces, leading to token-budget exhaustion on programs with long execution histories. Second, failures disproportionately concentrate in string-valued state, which we attribute to limitations of subword tokenization rather than program structure. To study…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software System Performance and Reliability · Parallel Computing and Optimization Techniques