TANet: Robust 3D Object Detection from Point Clouds with Triple   Attention

Zhe Liu; Xin Zhao; Tengteng Huang; Ruolan Hu; Yu Zhou; Xiang Bai

arXiv:1912.05163·cs.CV·December 12, 2019·29 cites

TANet: Robust 3D Object Detection from Point Clouds with Triple Attention

Zhe Liu, Xin Zhao, Tengteng Huang, Ruolan Hu, Yu Zhou, Xiang Bai

PDF

Open Access 2 Repos

TL;DR

TANet significantly improves 3D object detection robustness in point clouds by introducing a Triple Attention module and a Coarse-to-Fine Regression approach, excelling especially in noisy conditions and achieving top results on the KITTI benchmark.

Contribution

The paper proposes a novel TANet with Triple Attention and Coarse-to-Fine Regression modules, enhancing detection accuracy and robustness in noisy point cloud environments.

Findings

01

Outperforms state-of-the-art methods in noisy scenarios.

02

Ranks first on Pedestrian detection in KITTI benchmark.

03

Operates at around 29 frames per second.

Abstract

In this paper, we focus on exploring the robustness of the 3D object detection in point clouds, which has been rarely discussed in existing approaches. We observe two crucial phenomena: 1) the detection accuracy of the hard objects, e.g., Pedestrians, is unsatisfactory, 2) when adding additional noise points, the performance of existing approaches decreases rapidly. To alleviate these problems, a novel TANet is introduced in this paper, which mainly contains a Triple Attention (TA) module, and a Coarse-to-Fine Regression (CFR) module. By considering the channel-wise, point-wise and voxel-wise attention jointly, the TA module enhances the crucial information of the target while suppresses the unstable cloud points. Besides, the novel stacked TA further exploits the multi-level feature attention. In addition, the CFR module boosts the accuracy of localization without excessive computation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · 3D Surveying and Cultural Heritage · Human Pose and Action Recognition

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings