RT-DETRv3: Real-time End-to-End Object Detection with Hierarchical Dense Positive Supervision
Shuo Wang, Chunlong Xia, Feng Lv, Yifeng Shi

TL;DR
RT-DETRv3 introduces hierarchical dense positive supervision and novel training strategies to enhance real-time transformer-based object detection, significantly improving accuracy while maintaining efficiency.
Contribution
The paper proposes a hierarchical dense supervision method and a new learning strategy for RT-DETR, boosting training quality and detection performance in real-time object detection.
Findings
RT-DETRv3 outperforms existing real-time detectors on COCO.
RT-DETRv3-R18 achieves 48.1% AP, surpassing previous models.
RT-DETRv3-R101 reaches 54.6% AP, outperforming YOLOv10-X.
Abstract
RT-DETR is the first real-time end-to-end transformer-based object detector. Its efficiency comes from the framework design and the Hungarian matching. However, compared to dense supervision detectors like the YOLO series, the Hungarian matching provides much sparser supervision, leading to insufficient model training and difficult to achieve optimal results. To address these issues, we proposed a hierarchical dense positive supervision method based on RT-DETR, named RT-DETRv3. Firstly, we introduce a CNN-based auxiliary branch that provides dense supervision that collaborates with the original decoder to enhance the encoder feature representation. Secondly, to address insufficient decoder training, we propose a novel learning strategy involving self-attention perturbation. This strategy diversifies label assignment for positive samples across multiple query groups, thereby enriching…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Medical Image Segmentation Techniques
