NeRF-DetS: Enhanced Adaptive Spatial-wise Sampling and View-wise Fusion Strategies for NeRF-based Indoor Multi-view 3D Object Detection
Chi Huang, Xinyang Li, Yansong Qu, Changli Wu, Xiaofan Li, and Shengchuan Zhang, Liujuan Cao

TL;DR
NeRF-DetS introduces adaptive sampling and view-wise fusion strategies to improve indoor multi-view 3D object detection using implicit representations, achieving significant accuracy gains over previous methods.
Contribution
The paper presents NeRF-DetS, a novel approach with adaptive sampling and efficient multi-view feature fusion for enhanced 3D detection in indoor scenes.
Findings
Outperforms NeRF-Det with +5.02% mAP at IoU25
Achieves +5.92% mAP at IoU50 over previous methods
Demonstrates consistent improvements on ARKITScenes dataset
Abstract
In indoor scenes, the diverse distribution of object locations and scales makes the visual 3D perception task a big challenge. Previous works (e.g, NeRF-Det) have demonstrated that implicit representation has the capacity to benefit the visual 3D perception task in indoor scenes with high amount of overlap between input images. However, previous works cannot fully utilize the advancement of implicit representation because of fixed sampling and simple multi-view feature fusion. In this paper, inspired by sparse fashion method (e.g, DETR3D), we propose a simple yet effective method, NeRF-DetS, to address above issues. NeRF-DetS includes two modules: Progressive Adaptive Sampling Strategy (PASS) and Depth-Guided Simplified Multi-Head Attention Fusion (DS-MHA). Specifically, (1)PASS can automatically sample features of each layer within a dense 3D detector, using offsets predicted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Industrial Vision Systems and Defect Detection · 3D Surveying and Cultural Heritage
MethodsAttention Is All You Need · Linear Layer · Softmax · Multi-Head Attention
