Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis

Basile Van Hoorick; Rundi Wu; Ege Ozguroglu; Kyle Sargent; Ruoshi Liu,; Pavel Tokmakov; Achal Dave; Changxi Zheng; Carl Vondrick

arXiv:2405.14868·cs.CV·July 8, 2024

Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis

Basile Van Hoorick, Rundi Wu, Ege Ozguroglu, Kyle Sargent, Ruoshi Liu,, Pavel Tokmakov, Achal Dave, Changxi Zheng, Carl Vondrick

PDF

Open Access

TL;DR

The paper introduces GCD, a monocular dynamic view synthesis method that generates new viewpoints from a single video without requiring depth or explicit 3D modeling, leveraging diffusion priors for zero-shot real-world generalization.

Contribution

GCD is a novel end-to-end pipeline that synthesizes dynamic views from monocular videos using diffusion priors, without depth input or explicit 3D modeling, trained solely on synthetic data.

Findings

01

Zero-shot real-world generalization across multiple domains

02

Effective dynamic view synthesis without depth or 3D geometry

03

Potential applications in robotics and virtual reality

Abstract

Accurate reconstruction of complex dynamic scenes from just a single viewpoint continues to be a challenging task in computer vision. Current dynamic novel view synthesis methods typically require videos from many different camera viewpoints, necessitating careful recording setups, and significantly restricting their utility in the wild as well as in terms of embodied AI applications. In this paper, we propose $GCD$ , a controllable monocular dynamic view synthesis pipeline that leverages large-scale diffusion priors to, given a video of any scene, generate a synchronous video from any other chosen perspective, conditioned on a set of relative camera pose parameters. Our model does not require depth as input, and does not explicitly model 3D scene geometry, instead performing end-to-end video-to-video translation in order to achieve its goal efficiently. Despite being trained on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Satellite Image Processing and Photogrammetry · Image Processing Techniques and Applications

MethodsSparse Evolutionary Training · Diffusion