TL;DR
This paper introduces IPR, a physics-centric interactive reasoning model trained on a large game benchmark, demonstrating improved physical reasoning and zero-shot transfer capabilities.
Contribution
The paper presents IPR, a novel interactive physical reasoner that leverages world-model rollouts and PhysCode for improved physics and causality understanding in diverse environments.
Findings
IPR outperforms existing models on the G2U benchmark.
Performance improves with more training data and interaction steps.
IPR zero-shot transfers to unseen games, demonstrating generalization.
Abstract
Humans learn by observing, interacting with environments, and internalizing physics and causality. Here, we aim to ask whether an agent can similarly acquire human-like reasoning from interaction and keep improving with more experience. To study this, we introduce a Game-to-Unseen (G2U) benchmark of 1,000+ heterogeneous games that exhibit significant visual domain gaps. Existing approaches, including VLMs and world models, struggle to capture underlying physics and causality since they are not focused on core mechanisms and overfit to visual details. VLM/VLA agents reason but lack look-ahead in interactive settings, while world models imagine but imitate visual patterns rather than analyze physics and causality. We therefore propose IPR (Interactive Physical Reasoner), using world-model rollouts to score and reinforce a VLM's policy, and introduce PhysCode, a physics-centric action code…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
