Building Explicit World Model for Zero-Shot Open-World Object Manipulation

Xiaotong Li; Gang Chen; and Javier Alonso-Mora

arXiv:2603.13825·cs.RO·March 17, 2026

Building Explicit World Model for Zero-Shot Open-World Object Manipulation

Xiaotong Li, Gang Chen, and Javier Alonso-Mora

PDF

Open Access

TL;DR

This paper introduces an explicit-world-model framework for zero-shot open-world object manipulation in robotics, utilizing a digital twin for environment simulation and strategy evaluation, enabling generalization without task-specific demonstrations.

Contribution

The paper presents a novel explicit-world-model approach with a digital twin for zero-shot manipulation, reducing reliance on costly demonstrations and improving out-of-distribution generalization.

Findings

01

Achieves zero-shot manipulation without task-specific demonstrations

02

Successfully generalizes to multiple open-set tasks and objects

03

Demonstrates effective transfer from simulation to real-world deployment

Abstract

Open-world object manipulation remains a fundamental challenge in robotics. While Vision-Language-Action (VLA) models have demonstrated promising results, they rely heavily on large-scale robot action demonstrations, which are costly to collect and can hinder out-of-distribution generalization. In this paper, we propose an explicit-world-model-based framework for open-world manipulation that achieves zero-shot generalization by constructing a physically grounded digital twin of the environment. The framework integrates open-set perception, digital-twin reconstruction, sampling and evaluation of interaction strategies. By constructing a digital twin of the environment, our approach efficiently explores and evaluates manipulation strategies in physic-enabled simulator and reliably deploys the chosen strategy to the real world. Experimentally, the proposed framework is able to perform…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Multimodal Machine Learning Applications · Reinforcement Learning in Robotics