TL;DR
PhysWorld introduces a physics-aware framework that synthesizes diverse demonstrations using a simulator to learn accurate, fast, and generalizable world models of deformable objects from limited real video data.
Contribution
The paper presents a novel method combining simulation-based demonstration synthesis with physical property optimization to improve deformable object modeling from limited data.
Findings
Achieves accurate future predictions for deformable objects.
Enables inference speeds 47 times faster than PhysTwin.
Generalizes well to new interactions.
Abstract
Interactive world models that simulate object dynamics are crucial for robotics, VR, and AR. However, it remains a significant challenge to learn physics-consistent dynamics models from limited real-world video data, especially for deformable objects with spatially-varying physical properties. To overcome the challenge of data scarcity, we propose PhysWorld, a novel framework that utilizes a simulator to synthesize physically plausible and diverse demonstrations to learn efficient world models. Specifically, we first construct a physics-consistent digital twin within MPM simulator via constitutive model selection and global-to-local optimization of physical properties. Subsequently, we apply part-aware perturbations to the physical properties and generate various motion patterns for the digital twin, synthesizing extensive and diverse demonstrations. Finally, using these demonstrations,…
Peer Reviews
Decision·Submitted to ICLR 2026
1. **Comprehensive and well-integrated framework** The paper thoughtfully combines VLM-based material selection, multi-stage parameter optimization, and physics-guided data augmentation into a coherent pipeline bridging simulation and learning. 2. **Strong accuracy–efficiency trade-off** The proposed GNN-based world model achieves high predictive accuracy while running at **799 FPS**, demonstrating its potential for real-time applications that require fast yet physically consistent in
1. **Lack of quantitative downstream (control) evaluation — a key remaining limitation** While the framework’s real-time capability is convincingly demonstrated (47× faster inference), the paper does not provide **quantitative results in downstream control or planning tasks**. The MPPI example (Fig. 4) is qualitative only. Demonstrating success rates, trajectory errors, or computation times in model-based control would substantially strengthen the practical significance. Given tha
The paper addresses a practical problem: learning usable dynamic models from limited, real-world data. The proposed method offers a compelling solution that addresses the issues inherent in existing methods: the accuracy of a high-fidelity physics engine and the speed of a lightweight neural network. The demonstrated 47x speedup over the SOTA baseline is a massive practical gain that unlocks real-time applications, as evidenced by the successful MPPI planning experiment. P3-Pert (Part-aware Phy
1) Lacking robustness analysis for 3D data preprocessing: This paper proposes a pipeline for constructing world models from video. The entire pipeline begins with extraction of object point clouds from real interaction videos. This is a critical, non-trivial preprocessing step. The quality (density, noise, completeness) of this initial 3D tracking is fundamental to the accuracy of the digital twin optimization. The paper does not analyze the framework's sensitivity to this input quality. If the
* Well articulated approaches and easy to grasp the central idea; * Modelling and predicting deformable object behavior is an important topic so the paper may bring about impacts; * Promising to publish code and pre-trained models. Important for reproducibility of the work.
* All technical methods are well known and the paper reads like a concatenation of them without much scientific novelty or insights; * Validation should be stronger. Especially to the point of using a GNN to amortize a simulation. Better to have more direct evidence to show the benefits of having a faster predictive model when the simulator is already there. See more elaborated points in questions. * Validation should also be more extensive. I think this is especially the case given the paper is
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
