4Diffusion: Multi-view Video Diffusion Model for 4D Generation
Haiyu Zhang, Xinyuan Chen, Yaohui Wang, Xihui Liu, Yunhong Wang, Yu, Qiao

TL;DR
This paper introduces 4Diffusion, a novel 4D generation pipeline that uses a multi-view video diffusion model with a learnable motion module and a new loss function to produce spatial-temporally consistent 4D content from monocular videos.
Contribution
The paper presents a unified multi-view diffusion model with a learnable motion module and a 4D-aware loss, improving temporal consistency and detail in 4D video generation from monocular inputs.
Findings
Achieves superior spatial-temporal consistency in 4D generation.
Outperforms previous methods in qualitative and quantitative evaluations.
Effectively preserves appearance details and dynamic consistency.
Abstract
Current 4D generation methods have achieved noteworthy efficacy with the aid of advanced diffusion generative models. However, these methods lack multi-view spatial-temporal modeling and encounter challenges in integrating diverse prior knowledge from multiple diffusion models, resulting in inconsistent temporal appearance and flickers. In this paper, we propose a novel 4D generation pipeline, namely 4Diffusion, aimed at generating spatial-temporally consistent 4D content from a monocular video. We first design a unified diffusion model tailored for multi-view video generation by incorporating a learnable motion module into a frozen 3D-aware diffusion model to capture multi-view spatial-temporal correlations. After training on a curated dataset, our diffusion model acquires reasonable temporal consistency and inherently preserves the generalizability and spatial consistency of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimedia Communication and Technology
MethodsDiffusion
