Zero-Shot Novel View and Depth Synthesis with Multi-View Geometric Diffusion
Vitor Guizilini, Muhammad Zubair Irshad, Dian Chen, Greg, Shakhnarovich, Rares Ambrus

TL;DR
This paper introduces MVGD, a diffusion-based model capable of generating images and depth maps from new viewpoints directly at pixel level, trained on extensive multi-view data, achieving state-of-the-art results in view synthesis and depth estimation.
Contribution
The paper presents a novel diffusion architecture for joint image and depth generation from multiple views, with efficient training strategies and state-of-the-art performance.
Findings
Achieves state-of-the-art results in novel view synthesis benchmarks.
Effectively generates consistent images and depth maps from sparse input views.
Demonstrates scalable training techniques for large diffusion models.
Abstract
Current methods for 3D scene reconstruction from sparse posed images employ intermediate 3D representations such as neural fields, voxel grids, or 3D Gaussians, to achieve multi-view consistent scene appearance and geometry. In this paper we introduce MVGD, a diffusion-based architecture capable of direct pixel-level generation of images and depth maps from novel viewpoints, given an arbitrary number of input views. Our method uses raymap conditioning to both augment visual features with spatial information from different viewpoints, as well as to guide the generation of images and depth maps from novel views. A key aspect of our approach is the multi-task generation of images and depth maps, using learnable task embeddings to guide the diffusion process towards specific modalities. We train this model on a collection of more than 60 million multi-view samples from publicly available…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputer Graphics and Visualization Techniques · Advanced Optical Imaging Technologies · Image Processing Techniques and Applications
MethodsDiffusion
