Building Explicit World Model for Zero-Shot Open-World Object Manipulation
Xiaotong Li, Gang Chen, and Javier Alonso-Mora

TL;DR
This paper introduces an explicit-world-model framework for zero-shot open-world object manipulation in robotics, utilizing a digital twin for environment simulation and strategy evaluation, enabling generalization without task-specific demonstrations.
Contribution
The paper presents a novel explicit-world-model approach with a digital twin for zero-shot manipulation, reducing reliance on costly demonstrations and improving out-of-distribution generalization.
Findings
Achieves zero-shot manipulation without task-specific demonstrations
Successfully generalizes to multiple open-set tasks and objects
Demonstrates effective transfer from simulation to real-world deployment
Abstract
Open-world object manipulation remains a fundamental challenge in robotics. While Vision-Language-Action (VLA) models have demonstrated promising results, they rely heavily on large-scale robot action demonstrations, which are costly to collect and can hinder out-of-distribution generalization. In this paper, we propose an explicit-world-model-based framework for open-world manipulation that achieves zero-shot generalization by constructing a physically grounded digital twin of the environment. The framework integrates open-set perception, digital-twin reconstruction, sampling and evaluation of interaction strategies. By constructing a digital twin of the environment, our approach efficiently explores and evaluates manipulation strategies in physic-enabled simulator and reliably deploys the chosen strategy to the real world. Experimentally, the proposed framework is able to perform…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Multimodal Machine Learning Applications · Reinforcement Learning in Robotics
