4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models
Heng Yu, Chaoyang Wang, Peiye Zhuang, Willi Menapace and, Aliaksandr Siarohin, Junli Cao, Laszlo A Jeni, Sergey Tulyakov and, Hsin-Ying Lee

TL;DR
This paper presents a novel pipeline for photorealistic 4D scene generation from text, leveraging video diffusion models trained on real-world data, and introduces techniques for canonical representation and deformation modeling to improve realism and consistency.
Contribution
It introduces a new method that generates photorealistic 4D scenes without relying on multi-view generative models, using video diffusion models and deformation learning.
Findings
Produces highly photorealistic 4D scenes from text prompts.
Outperforms existing methods in realism and structural consistency.
Enables multi-view visualization of dynamic scenes.
Abstract
Existing dynamic scene generation methods mostly rely on distilling knowledge from pre-trained 3D generative models, which are typically fine-tuned on synthetic object datasets. As a result, the generated scenes are often object-centric and lack photorealism. To address these limitations, we introduce a novel pipeline designed for photorealistic text-to-4D scene generation, discarding the dependency on multi-view generative models and instead fully utilizing video generative models trained on diverse real-world datasets. Our method begins by generating a reference video using the video generation model. We then learn the canonical 3D representation of the video using a freeze-time video, delicately generated from the reference video. To handle inconsistencies in the freeze-time video, we jointly learn a per-frame deformation to model these imperfections. We then learn the temporal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Computer Graphics and Visualization Techniques
