DETRs Beat YOLOs on Real-time Object Detection

Yian Zhao; Wenyu Lv; Shangliang Xu; Jinman Wei; Guanzhong Wang,; Qingqing Dang; Yi Liu; Jie Chen

arXiv:2304.08069·cs.CV·April 4, 2024·228 cites

DETRs Beat YOLOs on Real-time Object Detection

Yian Zhao, Wenyu Lv, Shangliang Xu, Jinman Wei, Guanzhong Wang,, Qingqing Dang, Yi Liu, Jie Chen

PDF

Open Access 5 Repos 10 Models

TL;DR

RT-DETR is a novel real-time end-to-end Transformer-based object detector that outperforms YOLOs in both speed and accuracy by optimizing the architecture and query selection, making it practical for real-world applications.

Contribution

This paper introduces RT-DETR, the first real-time end-to-end Transformer detector, with an efficient hybrid encoder and uncertainty-minimal query selection to enhance speed and accuracy.

Findings

01

RT-DETR achieves 53.1% AP at 108 FPS on COCO with ResNet-50 backbone.

02

RT-DETR outperforms YOLOs and DINO in speed and accuracy on multiple benchmarks.

03

Supports flexible speed-accuracy trade-offs without retraining.

Abstract

The YOLO series has become the most popular framework for real-time object detection due to its reasonable trade-off between speed and accuracy. However, we observe that the speed and accuracy of YOLOs are negatively affected by the NMS. Recently, end-to-end Transformer-based detectors (DETRs) have provided an alternative to eliminating NMS. Nevertheless, the high computational cost limits their practicality and hinders them from fully exploiting the advantage of excluding NMS. In this paper, we propose the Real-Time DEtection TRansformer (RT-DETR), the first real-time end-to-end object detector to our best knowledge that addresses the above dilemma. We build RT-DETR in two steps, drawing on the advanced DETR: first we focus on maintaining accuracy while improving speed, followed by maintaining speed while improving accuracy. Specifically, we design an efficient hybrid encoder to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsAttention Is All You Need · Label Smoothing · Byte Pair Encoding · Dense Connections · Position-Wise Feed-Forward Layer · Linear Layer · Residual Connection · Softmax · Adam · Absolute Position Encodings