MVSDet: Multi-View Indoor 3D Object Detection via Efficient Plane Sweeps
Yating Xu, Chen Li, Gim Hee Lee

TL;DR
MVSDet introduces a plane sweep-based approach for indoor 3D object detection that improves accuracy by effectively utilizing probabilistic sampling and Gaussian Splatting, outperforming NeRF-based methods.
Contribution
The paper proposes a novel plane sweep method with probabilistic sampling and Gaussian Splatting for more accurate geometry-aware 3D detection in indoor scenes.
Findings
Outperforms NeRF-based methods on ScanNet and ARKitScenes datasets.
Achieves high detection accuracy with low computational overhead.
Demonstrates robustness across multiple indoor datasets.
Abstract
The key challenge of multi-view indoor 3D object detection is to infer accurate geometry information from images for precise 3D detection. Previous method relies on NeRF for geometry reasoning. However, the geometry extracted from NeRF is generally inaccurate, which leads to sub-optimal detection performance. In this paper, we propose MVSDet which utilizes plane sweep for geometry-aware 3D object detection. To circumvent the requirement for a large number of depth planes for accurate depth prediction, we design a probabilistic sampling and soft weighting mechanism to decide the placement of pixel features on the 3D volume. We select multiple locations that score top in the probability volume for each pixel and use their probability score to indicate the confidence. We further apply recent pixel-aligned Gaussian Splatting to regularize depth prediction and improve detection performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization
