RoboScape: Physics-informed Embodied World Model
Yu Shang, Xin Zhang, Yinzhou Tang, Lei Jin, Chen Gao, Wei Wu, Yong Li

TL;DR
RoboScape introduces a physics-informed world model that jointly learns video generation and physical properties, improving realism and physical accuracy in robotic scenario simulations for embodied intelligence.
Contribution
It presents a novel unified framework that integrates physics knowledge into world modeling, enhancing 3D geometric consistency and motion dynamics in robotic video synthesis.
Findings
Superior visual fidelity and physical plausibility in generated videos
Effective for robotic policy training and evaluation
Advances in physics-informed world modeling for embodied AI
Abstract
World models have become indispensable tools for embodied intelligence, serving as powerful simulators capable of generating realistic robotic videos while addressing critical data scarcity challenges. However, current embodied world models exhibit limited physical awareness, particularly in modeling 3D geometry and motion dynamics, resulting in unrealistic video generation for contact-rich robotic scenarios. In this paper, we present RoboScape, a unified physics-informed world model that jointly learns RGB video generation and physics knowledge within an integrated framework. We introduce two key physics-informed joint training tasks: temporal depth prediction that enhances 3D geometric consistency in video rendering, and keypoint dynamics learning that implicitly encodes physical properties (e.g., object shape and material characteristics) while improving complex motion modeling.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition · Social Robot Interaction and HRI
