TL;DR
This paper enhances single-image novel view synthesis by integrating sparse multimodal range data, like radar or LiDAR, into diffusion models to improve geometric accuracy and visual quality.
Contribution
It introduces a multimodal depth reconstruction framework using sparse range data with Gaussian Processes, improving view synthesis without altering existing generative models.
Findings
Replacing monocular depth with sparse range-based depth improves visual quality.
The approach enhances geometric consistency in novel view generation.
Sparse multimodal data significantly benefits diffusion-based view synthesis.
Abstract
Diffusion-based approaches have recently demonstrated strong performance for single-image novel view synthesis by conditioning generative models on geometry inferred from monocular depth estimation. However, in practice, the quality and consistency of the synthesized views are fundamentally limited by the reliability of the underlying depth estimates, which are often fragile under low-texture, adverse weather, and occlusion-heavy real-world conditions. In this work, we show that incorporating sparse multimodal range measurements provides a simple yet effective way to overcome these limitations. We introduce a multimodal depth reconstruction framework that leverages extremely sparse range sensing data, such as automotive radar or LiDAR, to produce dense depth maps that serve as robust geometric conditioning for diffusion-based novel view synthesis. Our approach models depth in an angular…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
