MDHA: Multi-Scale Deformable Transformer with Hybrid Anchors for Multi-View 3D Object Detection
Michelle Adeline, Junn Yong Loo, Vishnu Monn Baskaran

TL;DR
MDHA introduces a sparse, multi-scale deformable transformer with hybrid anchors for efficient and accurate multi-view 3D object detection, addressing bias and scalability issues of previous methods.
Contribution
It proposes a novel hybrid anchor construction and a circular deformable attention mechanism to improve efficiency and performance in multi-view 3D detection.
Findings
Achieves 46.4% mAP on nuScenes val set with ResNet101 backbone.
Outperforms baseline models with learnable anchor embeddings.
Introduces a single-image projection attention mechanism that maintains high accuracy.
Abstract
Multi-view 3D object detection is a crucial component of autonomous driving systems. Contemporary query-based methods primarily depend either on dataset-specific initialization of 3D anchors, introducing bias, or utilize dense attention mechanisms, which are computationally inefficient and unscalable. To overcome these issues, we present MDHA, a novel sparse query-based framework, which constructs adaptive 3D output proposals using hybrid anchors from multi-view, multi-scale image input. Fixed 2D anchors are combined with depth predictions to form 2.5D anchors, which are projected to obtain 3D proposals. To ensure high efficiency, our proposed Anchor Encoder performs sparse refinement and selects the top- anchors and features. Moreover, while existing multi-view attention mechanisms rely on projecting reference points to multiple images, our novel Circular Deformable Attention…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Vision Systems and Defect Detection · Hand Gesture Recognition Systems · Robotics and Sensor-Based Localization
MethodsSoftmax · Attention Is All You Need
