Voxel-FPN: multi-scale voxel feature aggregation in 3D object detection   from point clouds

Bei Wang; Jianping An; Jiayan Cao

arXiv:1907.05286·cs.CV·July 17, 2019·40 cites

Voxel-FPN: multi-scale voxel feature aggregation in 3D object detection from point clouds

Bei Wang, Jianping An, Jiayan Cao

PDF

Open Access

TL;DR

Voxel-FPN introduces a novel multi-scale voxel feature aggregation method for 3D object detection from point clouds, improving feature extraction and performance in autonomous driving scenarios.

Contribution

It presents a one-stage 3D object detector with a unique encoder-decoder architecture for multi-scale voxel feature fusion, outperforming existing baselines.

Findings

01

Better feature extraction from point cloud data.

02

Superior performance on KITTI-3D benchmark.

03

Effective balance of speed and accuracy.

Abstract

Object detection in point cloud data is one of the key components in computer vision systems, especially for autonomous driving applications. In this work, we present Voxel-FPN, a novel one-stage 3D object detector that utilizes raw data from LIDAR sensors only. The core framework consists of an encoder network and a corresponding decoder followed by a region proposal network. Encoder extracts multi-scale voxel information in a bottom-up manner while decoder fuses multiple feature maps from various scales in a top-down way. Extensive experiments show that the proposed method has better performance on extracting features from point data and demonstrates its superiority over some baselines on the challenging KITTI-3D benchmark, obtaining good performance on both speed and accuracy in real-world scenarios.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · 3D Shape Modeling and Analysis

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings