CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models

Rundi Wu; Ruiqi Gao; Ben Poole; Alex Trevithick; Changxi Zheng,; Jonathan T. Barron; Aleksander Holynski

arXiv:2411.18613·cs.CV·December 20, 2024

CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models

Rundi Wu, Ruiqi Gao, Ben Poole, Alex Trevithick, Changxi Zheng,, Jonathan T. Barron, Aleksander Holynski

PDF

Open Access

TL;DR

CAT4D introduces a novel diffusion-based approach for converting monocular videos into dynamic 4D scenes, enabling flexible view synthesis and 4D reconstruction with impressive results on benchmark datasets.

Contribution

The paper proposes a multi-view video diffusion model and a new sampling method for 4D scene creation from monocular videos, advancing dynamic scene synthesis and reconstruction.

Findings

01

Competitive performance on view synthesis benchmarks

02

Effective 4D scene reconstruction from monocular videos

03

Demonstrated creative 4D scene generation capabilities

Abstract

We present CAT4D, a method for creating 4D (dynamic 3D) scenes from monocular video. CAT4D leverages a multi-view video diffusion model trained on a diverse combination of datasets to enable novel view synthesis at any specified camera poses and timestamps. Combined with a novel sampling approach, this model can transform a single monocular video into a multi-view video, enabling robust 4D reconstruction via optimization of a deformable 3D Gaussian representation. We demonstrate competitive performance on novel view synthesis and dynamic scene reconstruction benchmarks, and highlight the creative capabilities for 4D scene generation from real or generated videos. See our project page for results and interactive demos: https://cat-4d.github.io/.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputer Graphics and Visualization Techniques · Video Coding and Compression Technologies · Image and Video Quality Assessment

MethodsDiffusion