3DPhysVideo: Consistency-Guided Flow SDE for Video Generation via 3D Scene Reconstruction and Physical Simulation
Hwidong Kim, Yunho Kim, Tae-Kyun Kim

TL;DR
3DPhysVideo introduces a physics-guided, training-free pipeline that generates realistic 3D scene videos from a single image by combining view synthesis, physics simulation, and consistency enforcement.
Contribution
It proposes a novel consistency-guided flow SDE method that enables physically plausible video generation from a single image without training, leveraging existing video models and physics simulation.
Findings
Outperforms state-of-the-art baselines on GPT scores and VideoPhy benchmark.
Successfully generates multi-object and fluid interaction videos from single images.
Efficiently runs on a single consumer GPU.
Abstract
Video generative models have made remarkable progress, yet they often yield visual artifacts that violate grounding in physical dynamics. Recent works such as PhysGen3D tackle single image-to-3D physics through mesh reconstruction and Physically-Based Rendering, but challenges remain in modeling fluid dynamics, multi-object interactions and photorealism. This work introduces 3DPhysVideo, a novel training-free pipeline that generates physically realistic videos from a single image. We repurpose an off-the-shelf video model for two stages. First, we use it as a novel view synthesizer to reconstruct complete 360-degree 3D scene geometry by guiding the image-to-video (I2V) flow model with rendered point clouds. Second, after applying physics solvers to this geometry, the physically simulated point cloud is used to guide the same I2V flow model to synthesize final, high-quality videos.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
