MultiDiff: Consistent Novel View Synthesis from a Single Image
Norman M\"uller, Katja Schwarz, Barbara Roessle, Lorenzo Porzi, Samuel, Rota Bul\`o, Matthias Nie{\ss}ner, Peter Kontschieder

TL;DR
MultiDiff is a new method that synthesizes consistent novel views from a single image by leveraging depth priors and video diffusion models, achieving high-quality results with improved stability and efficiency.
Contribution
It introduces a novel approach combining monocular depth and video diffusion priors for stable, multi-view consistent scene synthesis from a single image.
Findings
Outperforms state-of-the-art on RealEstate10K and ScanNet datasets.
Produces high-quality, multi-view consistent results for long-term scene generation.
Supports multi-view consistent editing without additional tuning.
Abstract
We introduce MultiDiff, a novel approach for consistent novel view synthesis of scenes from a single RGB image. The task of synthesizing novel views from a single reference image is highly ill-posed by nature, as there exist multiple, plausible explanations for unobserved areas. To address this issue, we incorporate strong priors in form of monocular depth predictors and video-diffusion models. Monocular depth enables us to condition our model on warped reference images for the target views, increasing geometric stability. The video-diffusion prior provides a strong proxy for 3D scenes, allowing the model to learn continuous and pixel-accurate correspondences across generated images. In contrast to approaches relying on autoregressive image generation that are prone to drifts and error accumulation, MultiDiff jointly synthesizes a sequence of frames yielding high-quality and multi-view…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Image and Video Retrieval Techniques · Satellite Image Processing and Photogrammetry
