Deformable DETR: Deformable Transformers for End-to-End Object Detection
Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai

TL;DR
Deformable DETR introduces a modified Transformer architecture with deformable attention that focuses on key sampling points, significantly improving object detection performance and training efficiency, especially for small objects.
Contribution
The paper presents Deformable DETR, a novel Transformer-based model with deformable attention modules that enhance feature processing and reduce training time.
Findings
Outperforms DETR on COCO benchmark
Achieves better small object detection
Requires 10 times fewer training epochs
Abstract
DETR has been recently proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance. However, it suffers from slow convergence and limited feature spatial resolution, due to the limitation of Transformer attention modules in processing image feature maps. To mitigate these issues, we proposed Deformable DETR, whose attention modules only attend to a small set of key sampling points around a reference. Deformable DETR can achieve better performance than DETR (especially on small objects) with 10 times less training epochs. Extensive experiments on the COCO benchmark demonstrate the effectiveness of our approach. Code is released at https://github.com/fundamentalvision/Deformable-DETR.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗SenseTime/deformable-detr-single-scale-dc5model· 26 dl26 dl
- 🤗SenseTime/deformable-detr-single-scalemodel· 346 dl346 dl
- 🤗SenseTime/deformable-detr-with-box-refine-two-stagemodel· 528 dl· ♡ 2528 dl♡ 2
- 🤗SenseTime/deformable-detr-with-box-refinemodel· 3.8k dl· ♡ 43.8k dl♡ 4
- 🤗SenseTime/deformable-detrmodel· 17k dl· ♡ 2117k dl♡ 21
- 🤗facebook/deformable-detr-deticmodel· 147 dl· ♡ 8147 dl♡ 8
- 🤗facebook/deformable-detr-box-supervisedmodel· 20 dl20 dl
- 🤗FoamoftheSea/pvt_v2_b0model· 15 dl15 dl
- 🤗OpenGVLab/pvt_v2_b0model· 3.6k dl· ♡ 33.6k dl♡ 3
- 🤗OpenGVLab/pvt_v2_b1model· 13 dl· ♡ 113 dl♡ 1
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Deformable Attention Module · Deformable DETR · Feedforward Network · Softmax · Convolution · Layer Normalization · Dense Connections
