Guiding Query Position and Performing Similar Attention for Transformer-Based Detection Heads
Xiaohu Jiang, Ze Chen, Zhicheng Wang, Erjin Zhou, ChunYuan

TL;DR
This paper introduces Guided Query Position (GQPos) to update object query locations and Similar Attention (SiA) to enhance multi-scale attention efficiency, significantly improving transformer-based detection head performance.
Contribution
It proposes GQPos for dynamic query position updating and SiA for efficient multi-scale attention fusion, advancing transformer detection methods.
Findings
GQPos improves detection accuracy across multiple models.
SiA accelerates training and enhances multi-scale detection performance.
Combined, these methods outperform existing transformer detection heads.
Abstract
After DETR was proposed, this novel transformer-based detection paradigm which performs several cross-attentions between object queries and feature maps for predictions has subsequently derived a series of transformer-based detection heads. These models iterate object queries after each cross-attention. However, they don't renew the query position which indicates object queries' position information. Thus model needs extra learning to figure out the newest regions that query position should express and need more attention. To fix this issue, we propose the Guided Query Position (GQPos) method to embed the latest location information of object queries to query position iteratively. Another problem of such transformer-based detection heads is the high complexity to perform attention on multi-scale feature maps, which hinders them from improving detection performance at all scales.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques
MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Adam · Residual Connection · Label Smoothing · Byte Pair Encoding · Softmax · Multi-Head Attention
