Novel View Synthesis with Diffusion Models
Daniel Watson, William Chan, Ricardo Martin-Brualla, Jonathan Ho,, Andrea Tagliasacchi, Mohammad Norouzi

TL;DR
This paper introduces 3DiM, a diffusion-based model for 3D consistent novel view synthesis from a single image, utilizing stochastic conditioning to improve multi-view consistency without geometry reliance.
Contribution
The paper proposes a novel stochastic conditioning technique for diffusion models, enabling high-fidelity, 3D consistent view synthesis from a single image without geometry or test-time optimization.
Findings
3DiM achieves higher fidelity than prior methods.
Stochastic conditioning improves 3D consistency.
The model scales easily to many scenes.
Abstract
We present 3DiM, a diffusion model for 3D novel view synthesis, which is able to translate a single input view into consistent and sharp completions across many views. The core component of 3DiM is a pose-conditional image-to-image diffusion model, which takes a source view and its pose as inputs, and generates a novel view for a target pose as output. 3DiM can generate multiple views that are 3D consistent using a novel technique called stochastic conditioning. The output views are generated autoregressively, and during the generation of each novel view, one selects a random conditioning view from the set of available views at each denoising step. We demonstrate that stochastic conditioning significantly improves the 3D consistency of a naive sampler for an image-to-image diffusion model, which involves conditioning on a single fixed view. We compare 3DiM to prior work on the SRN…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Vision and Imaging · 3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis
MethodsStable Rank Normalization · Diffusion
