From Generative Engines to Actionable Simulators: The Imperative of Physical Grounding in World Models
Zhikang Chen, Tingting Zhu

TL;DR
This paper argues that effective world models should focus on causal understanding and physical constraints rather than visual realism, emphasizing structured, stable, and intervention-aware simulation for decision-making tasks.
Contribution
It reframes world models as actionable simulators with structured interfaces and causal reasoning, moving beyond visual fidelity to improve safety-critical applications.
Findings
Visual realism does not guarantee understanding of physical dynamics.
Causal and constraint-aware models outperform purely visual models in long-term planning.
Medical decision-making tests show the importance of counterfactual reasoning in world models.
Abstract
A world model is an AI system that simulates how an environment evolves under actions, enabling planning through imagined futures rather than reactive perception. Current world models, however, suffer from visual conflation: the mistaken assumption that high-fidelity video generation implies an understanding of physical and causal dynamics. We show that while modern models excel at predicting pixels, they frequently violate invariant constraints, fail under intervention, and break down in safety-critical decision-making. This survey argues that visual realism is an unreliable proxy for world understanding. Instead, effective world models must encode causal structure, respect domain-specific constraints, and remain stable over long horizons. We propose a reframing of world models as actionable simulators rather than visual engines, emphasizing structured 4D interfaces, constraint-aware…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmbodied and Extended Cognition · Generative Adversarial Networks and Image Synthesis · Data Visualization and Analytics
