DETRs Beat YOLOs on Real-time Object Detection
Yian Zhao, Wenyu Lv, Shangliang Xu, Jinman Wei, Guanzhong Wang,, Qingqing Dang, Yi Liu, Jie Chen

TL;DR
RT-DETR is a novel real-time end-to-end Transformer-based object detector that outperforms YOLOs in both speed and accuracy by optimizing the architecture and query selection, making it practical for real-world applications.
Contribution
This paper introduces RT-DETR, the first real-time end-to-end Transformer detector, with an efficient hybrid encoder and uncertainty-minimal query selection to enhance speed and accuracy.
Findings
RT-DETR achieves 53.1% AP at 108 FPS on COCO with ResNet-50 backbone.
RT-DETR outperforms YOLOs and DINO in speed and accuracy on multiple benchmarks.
Supports flexible speed-accuracy trade-offs without retraining.
Abstract
The YOLO series has become the most popular framework for real-time object detection due to its reasonable trade-off between speed and accuracy. However, we observe that the speed and accuracy of YOLOs are negatively affected by the NMS. Recently, end-to-end Transformer-based detectors (DETRs) have provided an alternative to eliminating NMS. Nevertheless, the high computational cost limits their practicality and hinders them from fully exploiting the advantage of excluding NMS. In this paper, we propose the Real-Time DEtection TRansformer (RT-DETR), the first real-time end-to-end object detector to our best knowledge that addresses the above dilemma. We build RT-DETR in two steps, drawing on the advanced DETR: first we focus on maintaining accuracy while improving speed, followed by maintaining speed while improving accuracy. Specifically, we design an efficient hybrid encoder to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗PekingU/rtdetr_r101vd_coco_o365model· 77k dl· ♡ 1677k dl♡ 16
- 🤗PekingU/rtdetr_r34vdmodel· 223 dl· ♡ 3223 dl♡ 3
- 🤗PekingU/rtdetr_r18vd_coco_o365model· 1.9M dl· ♡ 41.9M dl♡ 4
- 🤗PekingU/rtdetr_r50vd_coco_o365model· 73k dl· ♡ 1773k dl♡ 17
- 🤗PekingU/rtdetr_r18vdmodel· 8.4k dl· ♡ 58.4k dl♡ 5
- 🤗PekingU/rtdetr_r50vdmodel· 37k dl· ♡ 2937k dl♡ 29
- 🤗PekingU/rtdetr_r101vdmodel· 1.4k dl· ♡ 41.4k dl♡ 4
- 🤗apolloparty/rtdetr_v2_r101vdmodel· 73 dl73 dl
- 🤗prd5/v3-rtdetr-r50-gambling-finetunemodel· 4 dl4 dl
- 🤗iqqy-x/rtdetr-r50-gambling-finetunemodel· 8 dl8 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsAttention Is All You Need · Label Smoothing · Byte Pair Encoding · Dense Connections · Position-Wise Feed-Forward Layer · Linear Layer · Residual Connection · Softmax · Adam · Absolute Position Encodings
