Enhancing Novel View Synthesis via Geometry Grounded Set Diffusion

Farhad G. Zanjani; Hong Cai; Amirhossein Habibian

arXiv:2601.07540·cs.CV·March 16, 2026

Enhancing Novel View Synthesis via Geometry Grounded Set Diffusion

Farhad G. Zanjani, Hong Cai, Amirhossein Habibian

PDF

Open Access

TL;DR

SetDiff is a novel geometry-grounded diffusion framework that significantly improves the quality and robustness of multi-view 3D scene synthesis, especially in challenging autonomous driving scenarios.

Contribution

It introduces a set-based diffusion model with explicit 3D priors and scalable multi-view processing for enhanced novel-view synthesis.

Findings

01

Improves perceptual fidelity and structural similarity.

02

Reduces hallucinations under low-signal conditions.

03

Demonstrates state-of-the-art results on multiple datasets.

Abstract

We present SetDiff, a geometry-grounded multi-view diffusion framework that enhances novel-view renderings produced by 3D Gaussian Splatting. Our method integrates explicit 3D priors, pixel-aligned coordinate maps and pose-aware Plucker ray embeddings, into a set-based diffusion model capable of jointly processing variable numbers of reference and target views. This formulation enables robust occlusion handling, reduces hallucinations under low-signal conditions, and improves photometric fidelity in visual content restoration. A unified set mixer performs global token-level attention across all input views, supporting scalable multi-camera enhancement while maintaining computational efficiency through latent-space supervision and selective decoding. Extensive experiments on EUVS, Para-Lane, nuScenes, and DL3DV demonstrate significant gains in perceptual fidelity, structural similarity,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Image Enhancement Techniques · Generative Adversarial Networks and Image Synthesis