Structure Information is the Key: Self-Attention RoI Feature Extractor in 3D Object Detection
Diankun Zhang, Zhijie Zheng, Xueting Bi, Xiaojun Liu

TL;DR
This paper emphasizes the importance of structural information in 3D object detection from point clouds and introduces SARFE, a self-attention module that enhances structural features, significantly improving detection performance.
Contribution
The paper reveals structural information as key in 3D detection and proposes SARFE, a plug-and-play self-attention module that boosts existing detectors' performance.
Findings
SARFE improves cyclist detection on KITTI dataset.
Voxel features contain more structural information than point features.
SARFE maintains real-time detection capability.
Abstract
Unlike 2D object detection where all RoI features come from grid pixels, the RoI feature extraction of 3D point cloud object detection is more diverse. In this paper, we first compare and analyze the differences in structure and performance between the two state-of-the-art models PV-RCNN and Voxel-RCNN. Then, we find that the performance gap between the two models does not come from point information, but structural information. The voxel features contain more structural information because they do quantization instead of downsampling to point cloud so that they can contain basically the complete information of the whole point cloud. The stronger structural information in voxel features makes the detector have higher performance in our experiments even if the voxel features don't have accurate location information. Then, we propose that structural information is the key to 3D object…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · 3D Shape Modeling and Analysis · Robotics and Sensor-Based Localization
