STD: Sparse-to-Dense 3D Object Detector for Point Cloud
Zetong Yang, Yanan Sun, Shu Liu, Xiaoyong Shen, Jiaya Jia

TL;DR
This paper introduces STD, a two-stage 3D object detection framework for point clouds that improves proposal generation and localization accuracy while maintaining high inference speed, outperforming existing methods on KITTI.
Contribution
The paper proposes a novel two-stage 3D detection framework with a new proposal generation method and a parallel IoU branch, achieving higher accuracy and efficiency.
Findings
Outperforms state-of-the-art methods on KITTI dataset
Achieves over 10 FPS inference speed
Significantly improves detection on hard samples
Abstract
We present a new two-stage 3D object detection framework, named sparse-to-dense 3D Object Detector (STD). The first stage is a bottom-up proposal generation network that uses raw point cloud as input to generate accurate proposals by seeding each point with a new spherical anchor. It achieves a high recall with less computation compared with prior works. Then, PointsPool is applied for generating proposal features by transforming their interior point features from sparse expression to compact representation, which saves even more computation time. In box prediction, which is the second stage, we implement a parallel intersection-over-union (IoU) branch to increase awareness of localization accuracy, resulting in further improved performance. We conduct experiments on KITTI dataset, and evaluate our method in terms of 3D object and Bird's Eye View (BEV) detection. Our method outperforms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · Human Pose and Action Recognition
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
