Coding Agent Is Good As World Simulator
Hongyu Wang, Jingquan Wang, Bocheng Zou, Radu Serban, Dan Negrut

TL;DR
This paper introduces a physics-based world modeling framework that uses executable simulation code and multiple specialized agents to generate physically plausible and instruction-faithful visual simulations.
Contribution
It presents a novel agentic framework combining planning, code generation, visual review, and physics analysis for improved physical accuracy in world models.
Findings
Outperforms video-based models in physical accuracy.
Achieves higher instruction fidelity and visual quality.
Applicable to driving and robot simulation scenarios.
Abstract
World models have emerged as a powerful paradigm for building interactive simulation environments, with recent video-based approaches demonstrating impressive progress in generating visually plausible dynamics. However, because these models typically infer dynamics from video and represent them in latent states, they do not explicitly enforce physical constraints. As a result, the generated video rollouts are not physically plausible, exhibiting unstable contacts, distorted shapes, or inconsistent motion. In this paper, we present an agentic framework constructing physics-based world models through executable simulation code. The framework coordinates planning, code generation, visual review, and physics analysis agents. The planning agent converts the natural language prompt into a structured scene plan, the code agent implements it as executable simulation code, and the visual review…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
