NMS Strikes Back
Jeffrey Ouyang-Zhang, Jang Hyun Cho, Xingyi Zhou, Philipp, Kr\"ahenb\"uhl

TL;DR
This paper compares one-to-one bipartite matching in DETRs with traditional one-to-many label assignments, finding that traditional methods with NMS outperform DETRs, and highlights the importance of architecture over matching strategy.
Contribution
It provides a comprehensive comparison showing that traditional label assignment with NMS can outperform DETRs' matching, emphasizing the role of architecture in detection performance.
Findings
Traditional NMS-based label assignment outperforms DETR matching.
Deformable-DETR achieves 50.2 COCO mAP in 12 epochs with ResNet50.
Bipartite matching is unnecessary for high-performance detection transformers.
Abstract
Detection Transformer (DETR) directly transforms queries to unique objects by using one-to-one bipartite matching during training and enables end-to-end object detection. Recently, these models have surpassed traditional detectors on COCO with undeniable elegance. However, they differ from traditional detectors in multiple designs, including model architecture and training schedules, and thus the effectiveness of one-to-one matching is not fully understood. In this work, we conduct a strict comparison between the one-to-one Hungarian matching in DETRs and the one-to-many label assignments in traditional detectors with non-maximum supervision (NMS). Surprisingly, we observe one-to-many assignments with NMS consistently outperform standard one-to-one matching under the same setting, with a significant gain of up to 2.5 mAP. Our detector that trains Deformable-DETR with traditional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗jozhang97/deta-resnet-50model· 82 dl· ♡ 282 dl♡ 2
- 🤗jozhang97/deta-swin-lmodel
- 🤗jozhang97/deta-swin-l-o365model
- 🤗jozhang97/deta-swin-largemodel· 81 dl· ♡ 1981 dl♡ 19
- 🤗jozhang97/deta-swin-large-o365model· 117 dl· ♡ 1117 dl♡ 1
- 🤗jozhang97/deta-resnet-50-24-epochsmodel· 44 dl· ♡ 244 dl♡ 2
- 🤗superb-ai/deta-swin-largemodel· 12 dl12 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsMulti-Head Attention · Attention Is All You Need · Label Smoothing · Layer Normalization · Dropout · Byte Pair Encoding · Linear Layer · Dense Connections · Position-Wise Feed-Forward Layer · Residual Connection
