Novel View Synthesis using DDIM Inversion

Sehajdeep Singh; A V Subramanyam; Aditya Gupta; Sahil Gupta

arXiv:2508.10688·cs.CV·January 9, 2026

Novel View Synthesis using DDIM Inversion

Sehajdeep Singh, A V Subramanyam, Aditya Gupta, Sahil Gupta

PDF

TL;DR

This paper introduces a lightweight, view translation framework using DDIM inversion and a specialized U-Net to synthesize high-fidelity novel views from a single image, overcoming limitations of existing methods.

Contribution

It proposes a novel fusion strategy and a camera pose-conditioned translation U-Net that improve detail preservation and generalization in single-image view synthesis.

Findings

01

Outperforms existing methods on MVImgNet dataset

02

Preserves texture and fine details effectively

03

Reduces blurriness in reconstructed views

Abstract

Synthesizing novel views from a single input image is a challenging task. It requires extrapolating the 3D structure of a scene while inferring details in occluded regions, and maintaining geometric consistency across viewpoints. Many existing methods must fine-tune large diffusion backbones using multiple views or train a diffusion model from scratch, which is extremely expensive. Additionally, they suffer from blurry reconstruction and poor generalization. This gap presents the opportunity to explore an explicit lightweight view translation framework that can directly utilize the high-fidelity generative capabilities of a pretrained diffusion model while reconstructing a scene from a novel view. Given the DDIM-inverted latent of a single input image, we employ a camera pose-conditioned translation U-Net, TUNet, to predict the inverted latent corresponding to the desired target view.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.