SAP-DETR: Bridging the Gap Between Salient Points and Queries-Based Transformer Detector for Fast Model Convergency
Yang Liu, Yao Zhang, Yixin Wang, Yang Zhang, Jiang Tian, Zhongchao, Shi, Jianping Fan, Zhiqiang He

TL;DR
SAP-DETR introduces a novel approach that treats object detection as a transformation from salient points to objects, significantly accelerating convergence and improving performance over existing Transformer-based detectors.
Contribution
The paper proposes SAP-DETR, which initializes query-specific reference points and aggregates them into objects, bridging the gap between salient points and query-based detectors for faster convergence.
Findings
Achieves 1.4x faster convergence than previous methods.
Promotes state-of-the-art AP by 1.0 under standard training.
Attains 46.9 AP on ResNet-DC-101 backbone.
Abstract
Recently, the dominant DETR-based approaches apply central-concept spatial prior to accelerate Transformer detector convergency. These methods gradually refine the reference points to the center of target objects and imbue object queries with the updated central reference information for spatially conditional attention. However, centralizing reference points may severely deteriorate queries' saliency and confuse detectors due to the indiscriminative spatial prior. To bridge the gap between the reference points of salient queries and Transformer detectors, we propose SAlient Point-based DETR (SAP-DETR) by treating object detection as a transformation from salient points to instance objects. In SAP-DETR, we explicitly initialize a query-specific reference point for each object query, gradually aggregate them into an instance object, and then predict the distance from each side of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Convolution · Softmax · Adam · Position-Wise Feed-Forward Layer · Dense Connections · Feedforward Network · Label Smoothing
