NeRF-Det++: Incorporating Semantic Cues and Perspective-aware Depth Supervision for Indoor Multi-View 3D Detection
Chenxi Huang, Yuenan Hou, Weicai Ye, Di Huang, Xiaoshui, Huang, Binbin Lin, Deng Cai, Wanli Ouyang

TL;DR
NeRF-Det++ enhances indoor multi-view 3D detection by integrating semantic cues, perspective-aware sampling, and ordinal depth supervision, leading to improved accuracy over previous methods.
Contribution
It introduces three novel solutions—semantic enhancement, perspective-aware sampling, and ordinal residual depth supervision—to address key shortcomings in existing NeRF-Det based 3D detection methods.
Findings
Outperforms NeRF-Det by +1.9% [email protected] on ScanNetV2
Achieves +3.5% [email protected] on ScanNetV2
Demonstrates effectiveness on ScanNetV2 and ARKITScenes datasets
Abstract
NeRF-Det has achieved impressive performance in indoor multi-view 3D detection by innovatively utilizing NeRF to enhance representation learning. Despite its notable performance, we uncover three decisive shortcomings in its current design, including semantic ambiguity, inappropriate sampling, and insufficient utilization of depth supervision. To combat the aforementioned problems, we present three corresponding solutions: 1) Semantic Enhancement. We project the freely available 3D segmentation annotations onto the 2D plane and leverage the corresponding 2D semantic maps as the supervision signal, significantly enhancing the semantic awareness of multi-view detectors. 2) Perspective-aware Sampling. Instead of employing the uniform sampling strategy, we put forward the perspective-aware sampling policy that samples densely near the camera while sparsely in the distance, more effectively…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · 3D Surveying and Cultural Heritage · Advanced Image and Video Retrieval Techniques
