Voxel-FPN: multi-scale voxel feature aggregation in 3D object detection from point clouds
Bei Wang, Jianping An, Jiayan Cao

TL;DR
Voxel-FPN introduces a novel multi-scale voxel feature aggregation method for 3D object detection from point clouds, improving feature extraction and performance in autonomous driving scenarios.
Contribution
It presents a one-stage 3D object detector with a unique encoder-decoder architecture for multi-scale voxel feature fusion, outperforming existing baselines.
Findings
Better feature extraction from point cloud data.
Superior performance on KITTI-3D benchmark.
Effective balance of speed and accuracy.
Abstract
Object detection in point cloud data is one of the key components in computer vision systems, especially for autonomous driving applications. In this work, we present Voxel-FPN, a novel one-stage 3D object detector that utilizes raw data from LIDAR sensors only. The core framework consists of an encoder network and a corresponding decoder followed by a region proposal network. Encoder extracts multi-scale voxel information in a bottom-up manner while decoder fuses multiple feature maps from various scales in a top-down way. Extensive experiments show that the proposed method has better performance on extracting features from point data and demonstrates its superiority over some baselines on the challenging KITTI-3D benchmark, obtaining good performance on both speed and accuracy in real-world scenarios.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · 3D Shape Modeling and Analysis
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
