RealWonder: Real-Time Physical Action-Conditioned Video Generation
Wei Liu, Ziyu Chen, Zizhang Li, Yue Wang, Hong-Xing Yu, Jiajun Wu

TL;DR
RealWonder is a real-time system that generates action-conditioned videos from a single image by integrating physics simulation with video generation, enabling interactive exploration of physical effects in various materials.
Contribution
It introduces a novel approach combining physics simulation with video generation for real-time, action-conditioned video synthesis from a single image.
Findings
Achieves 13.2 FPS at 480x832 resolution.
Enables interactive exploration of physical effects.
Supports diverse materials like fluids and deformable bodies.
Abstract
Current video generation models cannot simulate physical consequences of 3D actions like forces and robotic manipulations, as they lack structural understanding of how actions affect 3D scenes. We present RealWonder, the first real-time system for action-conditioned video generation from a single image. Our key insight is using physics simulation as an intermediate bridge: instead of directly encoding continuous actions, we translate them through physics simulation into visual representations (optical flow and RGB) that video models can process. RealWonder integrates three components: 3D reconstruction from single images, physics simulation, and a distilled video generator requiring only 4 diffusion steps. Our system achieves 13.2 FPS at 480x832 resolution, enabling interactive exploration of forces, robot actions, and camera controls on rigid objects, deformable bodies, fluids, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Generative Adversarial Networks and Image Synthesis · Robot Manipulation and Learning
