Deformable DETR: Deformable Transformers for End-to-End Object Detection

Xizhou Zhu; Weijie Su; Lewei Lu; Bin Li; Xiaogang Wang; Jifeng Dai

arXiv:2010.04159·cs.CV·March 19, 2021·1.9k cites

Deformable DETR: Deformable Transformers for End-to-End Object Detection

Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai

PDF

Open Access 5 Repos 10 Models 1 Video

TL;DR

Deformable DETR introduces a modified Transformer architecture with deformable attention that focuses on key sampling points, significantly improving object detection performance and training efficiency, especially for small objects.

Contribution

The paper presents Deformable DETR, a novel Transformer-based model with deformable attention modules that enhance feature processing and reduce training time.

Findings

01

Outperforms DETR on COCO benchmark

02

Achieves better small object detection

03

Requires 10 times fewer training epochs

Abstract

DETR has been recently proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance. However, it suffers from slow convergence and limited feature spatial resolution, due to the limitation of Transformer attention modules in processing image feature maps. To mitigate these issues, we proposed Deformable DETR, whose attention modules only attend to a small set of key sampling points around a reference. Deformable DETR can achieve better performance than DETR (especially on small objects) with 10 times less training epochs. Extensive experiments on the COCO benchmark demonstrate the effectiveness of our approach. Code is released at https://github.com/fundamentalvision/Deformable-DETR.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

Deformable DETR: Deformable Transformers for End-to-End Object Detection· slideslive

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Deformable Attention Module · Deformable DETR · Feedforward Network · Softmax · Convolution · Layer Normalization · Dense Connections