DS-Det: Single-Query Paradigm and Attention Disentangled Learning for Flexible Object Detection

Guiping Cao; Xiangyuan Lan; Wenjian Huang; Jianguo Zhang; Dongmei Jiang; Yaowei Wang

arXiv:2507.19807·cs.CV·July 29, 2025

DS-Det: Single-Query Paradigm and Attention Disentangled Learning for Flexible Object Detection

Guiping Cao, Xiangyuan Lan, Wenjian Huang, Jianguo Zhang, Dongmei Jiang, Yaowei Wang

PDF

TL;DR

DS-Det introduces a flexible single-query paradigm and attention disentangled learning to improve transformer-based object detection, addressing fixed-query limitations and query ambiguity for enhanced efficiency and accuracy.

Contribution

The paper proposes a unified single-query paradigm and a simplified attention framework to enhance flexibility and efficiency in transformer-based object detection.

Findings

01

Outperforms existing methods on COCO2017 and WiderPerson datasets.

02

Effectively addresses query ambiguity and attention interaction issues.

03

Demonstrates general applicability across multiple backbone models.

Abstract

Popular transformer detectors have achieved promising performance through query-based learning using attention mechanisms. However, the roles of existing decoder query types (e.g., content query and positional query) are still underexplored. These queries are generally predefined with a fixed number (fixed-query), which limits their flexibility. We find that the learning of these fixed-query is impaired by Recurrent Opposing inTeractions (ROT) between two attention operations: Self-Attention (query-to-query) and Cross-Attention (query-to-encoder), thereby degrading decoder efficiency. Furthermore, "query ambiguity" arises when shared-weight decoder layers are processed with both one-to-one and one-to-many label assignments during training, violating DETR's one-to-one matching principle. To address these challenges, we propose DS-Det, a more efficient detector capable of detecting a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.