DETRs with Hybrid Matching
Ding Jia, Yuhui Yuan, Haodi He, Xiaopei Wu, Haojun Yu and, Weihong Lin, Lei Sun, Chao Zhang, Han Hu

TL;DR
H-DETR introduces a hybrid matching scheme combining one-to-one and one-to-many matching during training, significantly enhancing DETR's accuracy while preserving its end-to-end inference efficiency across various vision tasks.
Contribution
The paper proposes a simple hybrid matching strategy that improves DETR's training efficacy and accuracy without affecting inference efficiency, applicable to multiple DETR variants.
Findings
Significant accuracy improvements across multiple DETR-based models.
Maintains end-to-end inference efficiency of original DETR.
Effective enhancement for various visual detection tasks.
Abstract
One-to-one set matching is a key design for DETR to establish its end-to-end capability, so that object detection does not require a hand-crafted NMS (non-maximum suppression) to remove duplicate detections. This end-to-end signature is important for the versatility of DETR, and it has been generalized to broader vision tasks. However, we note that there are few queries assigned as positive samples and the one-to-one set matching significantly reduces the training efficacy of positive samples. We propose a simple yet effective method based on a hybrid matching scheme that combines the original one-to-one matching branch with an auxiliary one-to-many matching branch during training. Our hybrid strategy has been shown to significantly improve accuracy. In inference, only the original one-to-one match branch is used, thus maintaining the end-to-end merit and the same inference efficiency…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
MethodsAttention Is All You Need · Linear Layer · Softmax · Dropout · Adam · Byte Pair Encoding · Label Smoothing · Multi-Head Attention · Residual Connection · Absolute Position Encodings
