AutoAlign: Pixel-Instance Feature Aggregation for Multi-Modal 3D Object   Detection

Zehui Chen; Zhenyu Li; Shiquan Zhang; Liangji Fang; Qinghong Jiang,; Feng Zhao; Bolei Zhou; Hang Zhao

arXiv:2201.06493·cs.CV·April 22, 2022·1 cites

AutoAlign: Pixel-Instance Feature Aggregation for Multi-Modal 3D Object Detection

Zehui Chen, Zhenyu Li, Shiquan Zhang, Liangji Fang, Qinghong Jiang,, Feng Zhao, Bolei Zhou, Hang Zhao

PDF

Open Access

TL;DR

AutoAlign introduces a learnable, data-driven feature fusion method for multi-modal 3D object detection, significantly improving detection accuracy by adaptively aligning image and LiDAR features.

Contribution

The paper proposes a novel automatic feature alignment strategy using a learnable map and cross-attention modules for enhanced multi-modal 3D detection.

Findings

01

Achieves 2.3 mAP improvement on KITTI dataset

02

Achieves 7.0 mAP improvement on nuScenes dataset

03

Reaches 70.9 NDS on nuScenes leaderboard

Abstract

Object detection through either RGB images or the LiDAR point clouds has been extensively explored in autonomous driving. However, it remains challenging to make these two data sources complementary and beneficial to each other. In this paper, we propose \textit{AutoAlign}, an automatic feature fusion strategy for 3D object detection. Instead of establishing deterministic correspondence with camera projection matrix, we model the mapping relationship between the image and point clouds with a learnable alignment map. This map enables our model to automate the alignment of non-homogenous features in a dynamic and data-driven manner. Specifically, a cross-attention feature alignment module is devised to adaptively aggregate \textit{pixel-level} image features for each voxel. To enhance the semantic consistency during feature alignment, we also design a self-supervised cross-modal feature…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · Visual Attention and Saliency Detection