AutoAlignV2: Deformable Feature Aggregation for Dynamic Multi-Modal 3D   Object Detection

Zehui Chen; Zhenyu Li; Shiquan Zhang; Liangji Fang; Qinhong Jiang,; Feng Zhao

arXiv:2207.10316·cs.CV·July 22, 2022·20 cites

AutoAlignV2: Deformable Feature Aggregation for Dynamic Multi-Modal 3D Object Detection

Zehui Chen, Zhenyu Li, Shiquan Zhang, Liangji Fang, Qinhong Jiang,, Feng Zhao

PDF

Open Access 1 Repo

TL;DR

AutoAlignV2 introduces a fast, efficient multi-modal 3D object detection framework that effectively combines point clouds and RGB images using deformable feature aggregation, achieving state-of-the-art results on nuScenes.

Contribution

It proposes a novel Cross-Domain DeformCAFA module for cross-modal feature aggregation and a dynamic inference scheme, significantly improving speed and accuracy over previous methods.

Findings

01

Achieves 72.4 NDS on nuScenes test leaderboard.

02

Outperforms previous multi-modal 3D detectors in accuracy.

03

Demonstrates improved efficiency and robustness in multi-modal fusion.

Abstract

Point clouds and RGB images are two general perceptional sources in autonomous driving. The former can provide accurate localization of objects, and the latter is denser and richer in semantic information. Recently, AutoAlign presents a learnable paradigm in combining these two modalities for 3D object detection. However, it suffers from high computational cost introduced by the global-wise attention. To solve the problem, we propose Cross-Domain DeformCAFA module in this work. It attends to sparse learnable sampling points for cross-modal relational modeling, which enhances the tolerance to calibration error and greatly speeds up the feature aggregation across different modalities. To overcome the complex GT-AUG under multi-modal settings, we design a simple yet effective cross-modal augmentation strategy on convex combination of image patches given their depth information. Moreover,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zehuichen123/autoalignv2
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Video Surveillance and Tracking Methods · Visual Attention and Saliency Detection

MethodsTest · Dropout