Semantic-Aligned Matching for Enhanced DETR Convergence and Multi-Scale   Feature Fusion

Gongjie Zhang; Zhipeng Luo; Jiaxing Huang; Shijian Lu; Eric P. Xing

arXiv:2207.14172·cs.CV·February 7, 2023·5 cites

Semantic-Aligned Matching for Enhanced DETR Convergence and Multi-Scale Feature Fusion

Gongjie Zhang, Zhipeng Luo, Jiaxing Huang, Shijian Lu, Eric P. Xing

PDF

Open Access 1 Repo

TL;DR

This paper introduces SAM-DETR++, a plug-and-play module that aligns semantics between object queries and image features, significantly accelerating DETR's convergence and enhancing multi-scale feature fusion for improved object detection performance.

Contribution

SAM-DETR++ is a novel semantic-aligned matching module that improves DETR's convergence speed and detection accuracy by aligning feature semantics and effectively fusing multi-scale features.

Findings

01

Achieves 44.8% AP with only 12 training epochs.

02

Attains 49.1% AP after 50 epochs on COCO.

03

Outperforms existing DETR variants in convergence speed and accuracy.

Abstract

The recently proposed DEtection TRansformer (DETR) has established a fully end-to-end paradigm for object detection. However, DETR suffers from slow training convergence, which hinders its applicability to various detection tasks. We observe that DETR's slow convergence is largely attributed to the difficulty in matching object queries to relevant regions due to the unaligned semantics between object queries and encoded image features. With this observation, we design Semantic-Aligned-Matching DETR++ (SAM-DETR++) to accelerate DETR's convergence and improve detection performance. The core of SAM-DETR++ is a plug-and-play module that projects object queries and encoded image features into the same feature embedding space, where each object query can be easily matched to relevant regions with similar semantics. Besides, SAM-DETR++ searches for multiple representative keypoints and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhanggongjie/sam-detr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning

MethodsAttention Is All You Need · Linear Layer · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Softmax · Dropout · Layer Normalization · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Label Smoothing