RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time   Detection Transformer

Wenyu Lv; Yian Zhao; Qinyao Chang; Kui Huang; Guanzhong Wang; Yi Liu

arXiv:2407.17140·cs.CV·July 25, 2024

RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer

Wenyu Lv, Yian Zhao, Qinyao Chang, Kui Huang, Guanzhong Wang, Yi Liu

PDF

3 Repos 5 Models

TL;DR

RT-DETRv2 introduces several improvements over previous real-time detection transformers, including flexible multi-scale feature sampling, deployment-friendly sampling operators, and adaptive training strategies, resulting in enhanced performance and practicality.

Contribution

The paper presents RT-DETRv2, a new real-time detection transformer with innovative sampling and training techniques for better accuracy and deployment ease.

Findings

01

Enhanced detection accuracy over previous models

02

Reduced deployment constraints compared to RT-DETR

03

Effective training with dynamic data augmentation

Abstract

In this report, we present RT-DETRv2, an improved Real-Time DEtection TRansformer (RT-DETR). RT-DETRv2 builds upon the previous state-of-the-art real-time detector, RT-DETR, and opens up a set of bag-of-freebies for flexibility and practicality, as well as optimizing the training strategy to achieve enhanced performance. To improve the flexibility, we suggest setting a distinct number of sampling points for features at different scales in the deformable attention to achieve selective multi-scale feature extraction by the decoder. To enhance practicality, we propose an optional discrete sampling operator to replace the grid_sample operator that is specific to RT-DETR compared to YOLOs. This removes the deployment constraints typically associated with DETRs. For the training strategy, we propose dynamic data augmentation and scale-adaptive hyperparameters customization to improve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSoftmax · Attention Is All You Need · Does Robinhood have a live chat? Live^SuPPorT^NOW · Sparse Evolutionary Training