Guiding Query Position and Performing Similar Attention for   Transformer-Based Detection Heads

Xiaohu Jiang; Ze Chen; Zhicheng Wang; Erjin Zhou; ChunYuan

arXiv:2108.09691·cs.CV·August 24, 2021·1 cites

Guiding Query Position and Performing Similar Attention for Transformer-Based Detection Heads

Xiaohu Jiang, Ze Chen, Zhicheng Wang, Erjin Zhou, ChunYuan

PDF

Open Access

TL;DR

This paper introduces Guided Query Position (GQPos) to update object query locations and Similar Attention (SiA) to enhance multi-scale attention efficiency, significantly improving transformer-based detection head performance.

Contribution

It proposes GQPos for dynamic query position updating and SiA for efficient multi-scale attention fusion, advancing transformer detection methods.

Findings

01

GQPos improves detection accuracy across multiple models.

02

SiA accelerates training and enhances multi-scale detection performance.

03

Combined, these methods outperform existing transformer detection heads.

Abstract

After DETR was proposed, this novel transformer-based detection paradigm which performs several cross-attentions between object queries and feature maps for predictions has subsequently derived a series of transformer-based detection heads. These models iterate object queries after each cross-attention. However, they don't renew the query position which indicates object queries' position information. Thus model needs extra learning to figure out the newest regions that query position should express and need more attention. To fix this issue, we propose the Guided Query Position (GQPos) method to embed the latest location information of object queries to query position iteratively. Another problem of such transformer-based detection heads is the high complexity to perform attention on multi-scale feature maps, which hinders them from improving detection performance at all scales.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques

MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Adam · Residual Connection · Label Smoothing · Byte Pair Encoding · Softmax · Multi-Head Attention