DFA3D: 3D Deformable Attention For 2D-to-3D Feature Lifting

Hongyang Li; Hao Zhang; Zhaoyang Zeng; Shilong Liu; Feng Li; Tianhe; Ren; and Lei Zhang

arXiv:2307.12972·cs.CV·July 25, 2023

DFA3D: 3D Deformable Attention For 2D-to-3D Feature Lifting

Hongyang Li, Hao Zhang, Zhaoyang Zeng, Shilong Liu, Feng Li, Tianhe, Ren, and Lei Zhang

PDF

Open Access

TL;DR

This paper introduces DFA3D, a novel 3D deformable attention operator that enhances 2D-to-3D feature lifting for improved 3D object detection, effectively addressing depth ambiguity and refining features through a Transformer-like architecture.

Contribution

We propose DFA3D, a new operator for 2D-to-3D feature lifting that alleviates depth ambiguity and refines features iteratively, with a memory-efficient implementation and demonstrated improvements on nuScenes.

Findings

01

+1.41% mAP improvement on nuScenes

02

Up to +15.1% mAP with high-quality depth

03

Effective alleviation of depth ambiguity

Abstract

In this paper, we propose a new operator, called 3D DeFormable Attention (DFA3D), for 2D-to-3D feature lifting, which transforms multi-view 2D image features into a unified 3D space for 3D object detection. Existing feature lifting approaches, such as Lift-Splat-based and 2D attention-based, either use estimated depth to get pseudo LiDAR features and then splat them to a 3D space, which is a one-pass operation without feature refinement, or ignore depth and lift features by 2D attention mechanisms, which achieve finer semantics while suffering from a depth ambiguity problem. In contrast, our DFA3D-based method first leverages the estimated depth to expand each view's 2D feature map to 3D and then utilizes DFA3D to aggregate features from the expanded 3D feature maps. With the help of DFA3D, the depth ambiguity problem can be effectively alleviated from the root, and the lifted features…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Vision and Imaging · Human Pose and Action Recognition