Dense Voxel Fusion for 3D Object Detection
Anas Mahmoud, Jordan S. K. Hu, Steven L. Waslander

TL;DR
This paper introduces Dense Voxel Fusion, a novel multi-scale dense voxel feature method for 3D object detection that enhances fusion of camera and LiDAR data, achieving state-of-the-art results without extra parameters.
Contribution
The paper proposes Dense Voxel Fusion, a sequential multi-scale voxel feature method, and a multi-modal training approach that improves 3D detection performance without additional trainable parameters.
Findings
Ranks 3rd on KITTI 3D detection benchmark
Significantly improves detection on Waymo dataset
Does not require stereo images or dense depth labels
Abstract
Camera and LiDAR sensor modalities provide complementary appearance and geometric information useful for detecting 3D objects for autonomous vehicle applications. However, current end-to-end fusion methods are challenging to train and underperform state-of-the-art LiDAR-only detectors. Sequential fusion methods suffer from a limited number of pixel and point correspondences due to point cloud sparsity, or their performance is strictly capped by the detections of one of the modalities. Our proposed solution, Dense Voxel Fusion (DVF) is a sequential fusion method that generates multi-scale dense voxel feature representations, improving expressiveness in low point density regions. To enhance multi-modal learning, we train directly with projected ground truth 3D bounding box labels, avoiding noisy, detector-specific 2D predictions. Both DVF and the multi-modal training approach can be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Dense Voxel Fusion for 3D Object Detection· youtube
Taxonomy
TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · Domain Adaptation and Few-Shot Learning
