SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View   Consistency

Yiming Xie; Chun-Han Yao; Vikram Voleti; Huaizu Jiang; Varun Jampani

arXiv:2407.17470·cs.CV·March 3, 2025

SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency

Yiming Xie, Chun-Han Yao, Vikram Voleti, Huaizu Jiang, Varun Jampani

PDF

2 Models 1 Video

TL;DR

SV4D introduces a unified diffusion model that generates temporally consistent multi-view videos of dynamic 3D objects from a single monocular video, enabling efficient 4D content creation.

Contribution

We propose a novel latent diffusion approach for joint multi-frame and multi-view consistent 3D content generation, unifying video synthesis and view synthesis tasks.

Findings

01

Achieves state-of-the-art results on novel-view video synthesis

02

Enables efficient 4D dynamic object representation without heavy optimization

03

Demonstrates high-quality, temporally consistent multi-view videos

Abstract

We present Stable Video 4D (SV4D), a latent video diffusion model for multi-frame and multi-view consistent dynamic 3D content generation. Unlike previous methods that rely on separately trained generative models for video generation and novel view synthesis, we design a unified diffusion model to generate novel view videos of dynamic 3D objects. Specifically, given a monocular reference video, SV4D generates novel views for each video frame that are temporally consistent. We then use the generated novel view videos to optimize an implicit 4D representation (dynamic NeRF) efficiently, without the need for cumbersome SDS-based optimization used in most prior works. To train our unified novel view video generation model, we curate a dynamic 3D object dataset from the existing Objaverse dataset. Extensive experimental results on multiple datasets and user studies demonstrate SV4D's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency· slideslive

Taxonomy

MethodsDiffusion