RealWonder: Real-Time Physical Action-Conditioned Video Generation

Wei Liu; Ziyu Chen; Zizhang Li; Yue Wang; Hong-Xing Yu; Jiajun Wu

arXiv:2603.05449·cs.CV·March 6, 2026

RealWonder: Real-Time Physical Action-Conditioned Video Generation

Wei Liu, Ziyu Chen, Zizhang Li, Yue Wang, Hong-Xing Yu, Jiajun Wu

PDF

Open Access

TL;DR

RealWonder is a real-time system that generates action-conditioned videos from a single image by integrating physics simulation with video generation, enabling interactive exploration of physical effects in various materials.

Contribution

It introduces a novel approach combining physics simulation with video generation for real-time, action-conditioned video synthesis from a single image.

Findings

01

Achieves 13.2 FPS at 480x832 resolution.

02

Enables interactive exploration of physical effects.

03

Supports diverse materials like fluids and deformable bodies.

Abstract

Current video generation models cannot simulate physical consequences of 3D actions like forces and robotic manipulations, as they lack structural understanding of how actions affect 3D scenes. We present RealWonder, the first real-time system for action-conditioned video generation from a single image. Our key insight is using physics simulation as an intermediate bridge: instead of directly encoding continuous actions, we translate them through physics simulation into visual representations (optical flow and RGB) that video models can process. RealWonder integrates three components: 3D reconstruction from single images, physics simulation, and a distilled video generator requiring only 4 diffusion steps. Our system achieves 13.2 FPS at 480x832 resolution, enabling interactive exploration of forces, robot actions, and camera controls on rigid objects, deformable bodies, fluids, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Generative Adversarial Networks and Image Synthesis · Robot Manipulation and Learning