Enhancing Novel View Synthesis via Geometry Grounded Set Diffusion
Farhad G. Zanjani, Hong Cai, Amirhossein Habibian

TL;DR
SetDiff is a novel geometry-grounded diffusion framework that significantly improves the quality and robustness of multi-view 3D scene synthesis, especially in challenging autonomous driving scenarios.
Contribution
It introduces a set-based diffusion model with explicit 3D priors and scalable multi-view processing for enhanced novel-view synthesis.
Findings
Improves perceptual fidelity and structural similarity.
Reduces hallucinations under low-signal conditions.
Demonstrates state-of-the-art results on multiple datasets.
Abstract
We present SetDiff, a geometry-grounded multi-view diffusion framework that enhances novel-view renderings produced by 3D Gaussian Splatting. Our method integrates explicit 3D priors, pixel-aligned coordinate maps and pose-aware Plucker ray embeddings, into a set-based diffusion model capable of jointly processing variable numbers of reference and target views. This formulation enables robust occlusion handling, reduces hallucinations under low-signal conditions, and improves photometric fidelity in visual content restoration. A unified set mixer performs global token-level attention across all input views, supporting scalable multi-camera enhancement while maintaining computational efficiency through latent-space supervision and selective decoding. Extensive experiments on EUVS, Para-Lane, nuScenes, and DL3DV demonstrate significant gains in perceptual fidelity, structural similarity,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Image Enhancement Techniques · Generative Adversarial Networks and Image Synthesis
