MDHA: Multi-Scale Deformable Transformer with Hybrid Anchors for   Multi-View 3D Object Detection

Michelle Adeline; Junn Yong Loo; Vishnu Monn Baskaran

arXiv:2406.17654·cs.RO·November 12, 2024

MDHA: Multi-Scale Deformable Transformer with Hybrid Anchors for Multi-View 3D Object Detection

Michelle Adeline, Junn Yong Loo, Vishnu Monn Baskaran

PDF

Open Access 1 Repo

TL;DR

MDHA introduces a sparse, multi-scale deformable transformer with hybrid anchors for efficient and accurate multi-view 3D object detection, addressing bias and scalability issues of previous methods.

Contribution

It proposes a novel hybrid anchor construction and a circular deformable attention mechanism to improve efficiency and performance in multi-view 3D detection.

Findings

01

Achieves 46.4% mAP on nuScenes val set with ResNet101 backbone.

02

Outperforms baseline models with learnable anchor embeddings.

03

Introduces a single-image projection attention mechanism that maintains high accuracy.

Abstract

Multi-view 3D object detection is a crucial component of autonomous driving systems. Contemporary query-based methods primarily depend either on dataset-specific initialization of 3D anchors, introducing bias, or utilize dense attention mechanisms, which are computationally inefficient and unscalable. To overcome these issues, we present MDHA, a novel sparse query-based framework, which constructs adaptive 3D output proposals using hybrid anchors from multi-view, multi-scale image input. Fixed 2D anchors are combined with depth predictions to form 2.5D anchors, which are projected to obtain 3D proposals. To ensure high efficiency, our proposed Anchor Encoder performs sparse refinement and selects the top- $k$ anchors and features. Moreover, while existing multi-view attention mechanisms rely on projecting reference points to multiple images, our novel Circular Deformable Attention…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

naomiex/mdha
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIndustrial Vision Systems and Defect Detection · Hand Gesture Recognition Systems · Robotics and Sensor-Based Localization

MethodsSoftmax · Attention Is All You Need