DiffDreamer: Towards Consistent Unsupervised Single-view Scene Extrapolation with Conditional Diffusion Models
Shengqu Cai, Eric Ryan Chan, Songyou Peng, Mohamad Shahbazi, Anton, Obukhov, Luc Van Gool, Gordon Wetzstein

TL;DR
DiffDreamer introduces an unsupervised diffusion model framework capable of generating consistent long-range scene extrapolations from single images, outperforming prior GAN-based methods in maintaining scene coherence.
Contribution
The paper presents DiffDreamer, a novel unsupervised diffusion-based approach for long-range scene extrapolation that leverages multiple frames for conditioning, improving consistency and quality.
Findings
Outperforms GAN-based methods in scene consistency
Effective with limited supervision and internet-collected images
Capable of synthesizing long camera trajectories
Abstract
Scene extrapolation -- the idea of generating novel views by flying into a given image -- is a promising, yet challenging task. For each predicted frame, a joint inpainting and 3D refinement problem has to be solved, which is ill posed and includes a high level of ambiguity. Moreover, training data for long-range scenes is difficult to obtain and usually lacks sufficient views to infer accurate camera poses. We introduce DiffDreamer, an unsupervised framework capable of synthesizing novel views depicting a long camera trajectory while training solely on internet-collected images of nature scenes. Utilizing the stochastic nature of the guided denoising steps, we train the diffusion models to refine projected RGBD images but condition the denoising steps on multiple past and future frames for inference. We demonstrate that image-conditioned diffusion models can effectively perform…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques
MethodsInpainting · Diffusion
