STD: Sparse-to-Dense 3D Object Detector for Point Cloud

Zetong Yang; Yanan Sun; Shu Liu; Xiaoyong Shen; Jiaya Jia

arXiv:1907.10471·cs.CV·July 25, 2019·56 cites

STD: Sparse-to-Dense 3D Object Detector for Point Cloud

Zetong Yang, Yanan Sun, Shu Liu, Xiaoyong Shen, Jiaya Jia

PDF

Open Access

TL;DR

This paper introduces STD, a two-stage 3D object detection framework for point clouds that improves proposal generation and localization accuracy while maintaining high inference speed, outperforming existing methods on KITTI.

Contribution

The paper proposes a novel two-stage 3D detection framework with a new proposal generation method and a parallel IoU branch, achieving higher accuracy and efficiency.

Findings

01

Outperforms state-of-the-art methods on KITTI dataset

02

Achieves over 10 FPS inference speed

03

Significantly improves detection on hard samples

Abstract

We present a new two-stage 3D object detection framework, named sparse-to-dense 3D Object Detector (STD). The first stage is a bottom-up proposal generation network that uses raw point cloud as input to generate accurate proposals by seeding each point with a new spherical anchor. It achieves a high recall with less computation compared with prior works. Then, PointsPool is applied for generating proposal features by transforming their interior point features from sparse expression to compact representation, which saves even more computation time. In box prediction, which is the second stage, we implement a parallel intersection-over-union (IoU) branch to increase awareness of localization accuracy, resulting in further improved performance. We conduct experiments on KITTI dataset, and evaluate our method in terms of 3D object and Bird's Eye View (BEV) detection. Our method outperforms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · Human Pose and Action Recognition

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings