4DVD: Cascaded Dense-view Video Diffusion Model for High-quality 4D Content Generation

Shuzhou Yang; Xiaodong Cun; Xiaoyu Li; Yaowei Li; Jian Zhang

arXiv:2508.04467·cs.CV·August 7, 2025

4DVD: Cascaded Dense-view Video Diffusion Model for High-quality 4D Content Generation

Shuzhou Yang, Xiaodong Cun, Xiaoyu Li, Yaowei Li, Jian Zhang

PDF

TL;DR

4DVD introduces a cascaded diffusion model that decouples 4D content generation into layout prediction and structure-aware refinement, achieving high-quality 4D video synthesis with superior consistency and practical applicability.

Contribution

The paper proposes a novel cascaded diffusion approach for 4D content generation that separates layout prediction from detailed synthesis, improving quality and consistency over prior methods.

Findings

01

Achieves state-of-the-art results in 4D video synthesis.

02

Demonstrates superior cross-view and temporal consistency.

03

Introduces a new dataset, D-Objaverse, for training and evaluation.

Abstract

Given the high complexity of directly generating high-dimensional data such as 4D, we present 4DVD, a cascaded video diffusion model that generates 4D content in a decoupled manner. Unlike previous multi-view video methods that directly model 3D space and temporal features simultaneously with stacked cross view/temporal attention modules, 4DVD decouples this into two subtasks: coarse multi-view layout generation and structure-aware conditional generation, and effectively unifies them. Specifically, given a monocular video, 4DVD first predicts the dense view content of its layout with superior cross-view and temporal consistency. Based on the produced layout priors, a structure-aware spatio-temporal generation branch is developed, combining these coarse structural priors with the exquisite appearance content of input monocular video to generate final high-quality dense-view videos.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.