Detection Transformer with Stable Matching

Shilong Liu; Tianhe Ren; Jiayu Chen; Zhaoyang Zeng; Hao Zhang; Feng; Li; Hongyang Li; Jun Huang; Hang Su; Jun Zhu; Lei Zhang

arXiv:2304.04742·cs.CV·April 11, 2023·5 cites

Detection Transformer with Stable Matching

Shilong Liu, Tianhe Ren, Jiayu Chen, Zhaoyang Zeng, Hao Zhang, Feng, Li, Hongyang Li, Jun Huang, Hang Su, Jun Zhu, Lei Zhang

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper identifies the cause of unstable matching in DETR models and proposes simple modifications using positional metrics, leading to significant performance improvements on COCO detection benchmarks.

Contribution

The paper introduces position-supervised loss and position-modulated cost to stabilize matching in DETR, enhancing detection accuracy across various models.

Findings

01

Achieves 50.4 AP on COCO with ResNet-50 in 12 epochs

02

Sets new records with 51.5 AP in 24 epochs

03

Attains 63.8 AP on COCO test-dev with Swin-Large backbone

Abstract

This paper is concerned with the matching stability problem across different decoder layers in DEtection TRansformers (DETR). We point out that the unstable matching in DETR is caused by a multi-optimization path problem, which is highlighted by the one-to-one matching design in DETR. To address this problem, we show that the most important design is to use and only use positional metrics (like IOU) to supervise classification scores of positive examples. Under the principle, we propose two simple yet effective modifications by integrating positional metrics to DETR's classification loss and matching cost, named position-supervised loss and position-modulated cost. We verify our methods on several DETR variants. Our methods show consistent improvements over baselines. By integrating our methods with DINO, we achieve 50.4 and 51.5 AP on the COCO detection benchmark using ResNet-50…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Detection Transformer with Stable Matching· youtube

Taxonomy

TopicsDigital Media Forensic Detection · Wireless Signal Modulation Classification · Anomaly Detection Techniques and Applications

MethodsMulti-Head Attention · Attention Is All You Need · Vision Transformer · Dropout · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Softmax · Linear Layer · Byte Pair Encoding · Layer Normalization