RoboScape: Physics-informed Embodied World Model

Yu Shang; Xin Zhang; Yinzhou Tang; Lei Jin; Chen Gao; Wei Wu; Yong Li

arXiv:2506.23135·cs.CV·July 1, 2025

RoboScape: Physics-informed Embodied World Model

Yu Shang, Xin Zhang, Yinzhou Tang, Lei Jin, Chen Gao, Wei Wu, Yong Li

PDF

Open Access

TL;DR

RoboScape introduces a physics-informed world model that jointly learns video generation and physical properties, improving realism and physical accuracy in robotic scenario simulations for embodied intelligence.

Contribution

It presents a novel unified framework that integrates physics knowledge into world modeling, enhancing 3D geometric consistency and motion dynamics in robotic video synthesis.

Findings

01

Superior visual fidelity and physical plausibility in generated videos

02

Effective for robotic policy training and evaluation

03

Advances in physics-informed world modeling for embodied AI

Abstract

World models have become indispensable tools for embodied intelligence, serving as powerful simulators capable of generating realistic robotic videos while addressing critical data scarcity challenges. However, current embodied world models exhibit limited physical awareness, particularly in modeling 3D geometry and motion dynamics, resulting in unrealistic video generation for contact-rich robotic scenarios. In this paper, we present RoboScape, a unified physics-informed world model that jointly learns RGB video generation and physics knowledge within an integrated framework. We introduce two key physics-informed joint training tasks: temporal depth prediction that enhances 3D geometric consistency in video rendering, and keypoint dynamics learning that implicitly encodes physical properties (e.g., object shape and material characteristics) while improving complex motion modeling.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition · Social Robot Interaction and HRI