VoxNeRF: Bridging Voxel Representation and Neural Radiance Fields for   Enhanced Indoor View Synthesis

Sen Wang; Qing Cheng; Stefano Gasperini; Wei Zhang; Shun-Cheng Wu,; Niclas Zeller; Daniel Cremers; Nassir Navab

arXiv:2311.05289·cs.CV·December 5, 2024·1 cites

VoxNeRF: Bridging Voxel Representation and Neural Radiance Fields for Enhanced Indoor View Synthesis

Sen Wang, Qing Cheng, Stefano Gasperini, Wei Zhang, Shun-Cheng Wu,, Niclas Zeller, Daniel Cremers, Nassir Navab

PDF

Open Access

TL;DR

VoxNeRF introduces a voxel-guided sampling method and depth loss to improve indoor view synthesis, achieving higher quality and efficiency in real-time scenarios using geometry priors.

Contribution

It presents a novel voxel-guided sampling technique and depth loss that leverage geometry priors for faster, higher-quality indoor view synthesis.

Findings

01

Outperforms state-of-the-art methods on ScanNet datasets

02

Reduces training and rendering time significantly

03

Establishes new benchmarks for indoor view synthesis

Abstract

The generation of high-fidelity view synthesis is essential for robotic navigation and interaction but remains challenging, particularly in indoor environments and real-time scenarios. Existing techniques often require significant computational resources for both training and rendering, and they frequently result in suboptimal 3D representations due to insufficient geometric structuring. To address these limitations, we introduce VoxNeRF, a novel approach that utilizes easy-to-obtain geometry priors to enhance both the quality and efficiency of neural indoor reconstruction and novel view synthesis. We propose an efficient voxel-guided sampling technique that allocates computational resources selectively to the most relevant segments of rays based on a voxel-encoded geometry prior, significantly reducing training and rendering time. Additionally, we incorporate a robust depth loss to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Computer Graphics and Visualization Techniques · Robotics and Sensor-Based Localization

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings