Novel View Synthesis using DDIM Inversion
Sehajdeep Singh, A V Subramanyam, Aditya Gupta, Sahil Gupta

TL;DR
This paper introduces a lightweight, view translation framework using DDIM inversion and a specialized U-Net to synthesize high-fidelity novel views from a single image, overcoming limitations of existing methods.
Contribution
It proposes a novel fusion strategy and a camera pose-conditioned translation U-Net that improve detail preservation and generalization in single-image view synthesis.
Findings
Outperforms existing methods on MVImgNet dataset
Preserves texture and fine details effectively
Reduces blurriness in reconstructed views
Abstract
Synthesizing novel views from a single input image is a challenging task. It requires extrapolating the 3D structure of a scene while inferring details in occluded regions, and maintaining geometric consistency across viewpoints. Many existing methods must fine-tune large diffusion backbones using multiple views or train a diffusion model from scratch, which is extremely expensive. Additionally, they suffer from blurry reconstruction and poor generalization. This gap presents the opportunity to explore an explicit lightweight view translation framework that can directly utilize the high-fidelity generative capabilities of a pretrained diffusion model while reconstructing a scene from a novel view. Given the DDIM-inverted latent of a single input image, we employ a camera pose-conditioned translation U-Net, TUNet, to predict the inverted latent corresponding to the desired target view.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
