Kinema4D: Kinematic 4D World Modeling for Spatiotemporal Embodied Simulation
Mutian Xu, Tianbao Zhang, Tianqi Liu, Zhaoxi Chen, Xiaoguang Han, Ziwei Liu

TL;DR
Kinema4D introduces a novel 4D generative robotic simulator that models precise robot controls and environmental reactions, enabling realistic, geometry-consistent, and zero-shot transferable embodied simulations for advancing AI robotics research.
Contribution
The paper presents Kinema4D, a new 4D simulation framework that disentangles robot control and environmental reactions, along with a large-scale dataset Robo4D-200k for training and evaluation.
Findings
Effectively simulates physically-plausible interactions
Demonstrates geometry consistency and embodiment-agnostic behavior
Shows potential for zero-shot transfer in embodied simulation
Abstract
Simulating robot-world interactions is a cornerstone of Embodied AI. Recently, a few works have shown promise in leveraging video generations to transcend the rigid visual/physical constraints of traditional simulators. However, they primarily operate in 2D space or are guided by static environmental cues, ignoring the fundamental reality that robot-world interactions are inherently 4D spatiotemporal events that require precise interactive modeling. To restore this 4D essence while ensuring the precise robot control, we introduce Kinema4D, a new action-conditioned 4D generative robotic simulator that disentangles the robot-world interaction into: i) Precise 4D representation of robot controls: we drive a URDF-based 3D robot via kinematics, producing a precise 4D robot control trajectory. ii) Generative 4D modeling of environmental reactions: we project the 4D robot trajectory into a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Generative Adversarial Networks and Image Synthesis · Social Robot Interaction and HRI
