CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models
Rundi Wu, Ruiqi Gao, Ben Poole, Alex Trevithick, Changxi Zheng,, Jonathan T. Barron, Aleksander Holynski

TL;DR
CAT4D introduces a novel diffusion-based approach for converting monocular videos into dynamic 4D scenes, enabling flexible view synthesis and 4D reconstruction with impressive results on benchmark datasets.
Contribution
The paper proposes a multi-view video diffusion model and a new sampling method for 4D scene creation from monocular videos, advancing dynamic scene synthesis and reconstruction.
Findings
Competitive performance on view synthesis benchmarks
Effective 4D scene reconstruction from monocular videos
Demonstrated creative 4D scene generation capabilities
Abstract
We present CAT4D, a method for creating 4D (dynamic 3D) scenes from monocular video. CAT4D leverages a multi-view video diffusion model trained on a diverse combination of datasets to enable novel view synthesis at any specified camera poses and timestamps. Combined with a novel sampling approach, this model can transform a single monocular video into a multi-view video, enabling robust 4D reconstruction via optimization of a deformable 3D Gaussian representation. We demonstrate competitive performance on novel view synthesis and dynamic scene reconstruction benchmarks, and highlight the creative capabilities for 4D scene generation from real or generated videos. See our project page for results and interactive demos: https://cat-4d.github.io/.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputer Graphics and Visualization Techniques · Video Coding and Compression Technologies · Image and Video Quality Assessment
MethodsDiffusion
