Enhancing DETRs Variants through Improved Content Query and Similar Query Aggregation
Yingying Zhang, Chuangji Shi, Xin Guo, Jiangwei Lao, Jian Wang,, Jiaotuan Wang, Jingdong Chen

TL;DR
This paper introduces a Self-Adaptive Content Query (SACQ) module and a query aggregation strategy to improve DETR variants by generating more adaptive content queries and better optimizing candidate selection, leading to over 1.0 AP improvement.
Contribution
The paper proposes a novel SACQ module and a query aggregation method to enhance DETR's performance by generating adaptive content queries and improving candidate optimization.
Findings
Achieved over 1.0 AP improvement on COCO dataset.
Effective across six different DETR variants.
Enhanced focus on target objects through adaptive queries.
Abstract
The design of the query is crucial for the performance of DETR and its variants. Each query consists of two components: a content part and a positional one. Traditionally, the content query is initialized with a zero or learnable embedding, lacking essential content information and resulting in sub-optimal performance. In this paper, we introduce a novel plug-and-play module, Self-Adaptive Content Query (SACQ), to address this limitation. The SACQ module utilizes features from the transformer encoder to generate content queries via self-attention pooling. This allows candidate queries to adapt to the input image, resulting in a more comprehensive content prior and better focus on target objects. However, this improved concentration poses a challenge for the training process that utilizes the Hungarian matching, which selects only a single candidate and suppresses other similar ones. To…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsAttention Is All You Need · Dense Connections · Feedforward Network · Dropout · Label Smoothing · Residual Connection · Softmax · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Absolute Position Encodings
