TANet: Robust 3D Object Detection from Point Clouds with Triple Attention
Zhe Liu, Xin Zhao, Tengteng Huang, Ruolan Hu, Yu Zhou, Xiang Bai

TL;DR
TANet significantly improves 3D object detection robustness in point clouds by introducing a Triple Attention module and a Coarse-to-Fine Regression approach, excelling especially in noisy conditions and achieving top results on the KITTI benchmark.
Contribution
The paper proposes a novel TANet with Triple Attention and Coarse-to-Fine Regression modules, enhancing detection accuracy and robustness in noisy point cloud environments.
Findings
Outperforms state-of-the-art methods in noisy scenarios.
Ranks first on Pedestrian detection in KITTI benchmark.
Operates at around 29 frames per second.
Abstract
In this paper, we focus on exploring the robustness of the 3D object detection in point clouds, which has been rarely discussed in existing approaches. We observe two crucial phenomena: 1) the detection accuracy of the hard objects, e.g., Pedestrians, is unsatisfactory, 2) when adding additional noise points, the performance of existing approaches decreases rapidly. To alleviate these problems, a novel TANet is introduced in this paper, which mainly contains a Triple Attention (TA) module, and a Coarse-to-Fine Regression (CFR) module. By considering the channel-wise, point-wise and voxel-wise attention jointly, the TA module enhances the crucial information of the target while suppresses the unstable cloud points. Besides, the novel stacked TA further exploits the multi-level feature attention. In addition, the CFR module boosts the accuracy of localization without excessive computation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · 3D Surveying and Cultural Heritage · Human Pose and Action Recognition
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
