Diffusion4D: Fast Spatial-temporal Consistent 4D Generation via Video   Diffusion Models

Hanwen Liang; Yuyang Yin; Dejia Xu; Hanxue Liang; Zhangyang Wang,; Konstantinos N. Plataniotis; Yao Zhao; Yunchao Wei

arXiv:2405.16645·cs.CV·May 28, 2024

Diffusion4D: Fast Spatial-temporal Consistent 4D Generation via Video Diffusion Models

Hanwen Liang, Yuyang Yin, Dejia Xu, Hanxue Liang, Zhangyang Wang,, Konstantinos N. Plataniotis, Yao Zhao, Yunchao Wei

PDF

Open Access 1 Datasets 1 Video

TL;DR

Diffusion4D introduces an efficient framework for 4D content generation that ensures spatial-temporal consistency, leveraging a novel 4D-aware diffusion model, a curated dataset, and explicit 4D construction techniques.

Contribution

The paper presents a new scalable 4D diffusion framework that improves speed and consistency in 4D asset generation, incorporating novel metrics and reconstruction losses.

Findings

01

Outperforms prior methods in efficiency and 4D consistency

02

Generates high-fidelity 4D assets within minutes

03

Achieves superior multi-view and temporal coherence

Abstract

The availability of large-scale multimodal datasets and advancements in diffusion models have significantly accelerated progress in 4D content generation. Most prior approaches rely on multiple image or video diffusion models, utilizing score distillation sampling for optimization or generating pseudo novel views for direct supervision. However, these methods are hindered by slow optimization speeds and multi-view inconsistency issues. Spatial and temporal consistency in 4D geometry has been extensively explored respectively in 3D-aware diffusion models and traditional monocular video diffusion models. Building on this foundation, we propose a strategy to migrate the temporal consistency in video diffusion models to the spatial-temporal consistency required for 4D generation. Specifically, we present a novel framework, \textbf{Diffusion4D}, for efficient and scalable 4D content…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

hw-liang/Diffusion4D
dataset· 306 dl
306 dl

Videos

Diffusion4D: Fast Spatial-temporal Consistent 4D generation via Video Diffusion Models· slideslive

Taxonomy

TopicsAdvanced Vision and Imaging · Video Coding and Compression Technologies · Computer Graphics and Visualization Techniques

MethodsSparse Evolutionary Training · Diffusion