Anchor DETR: Query Design for Transformer-Based Object Detection
Yingming Wang, Xiangyu Zhang, Tong Yang, Jian Sun

TL;DR
Anchor DETR introduces an anchor point-based query design for transformer object detection, improving interpretability, efficiency, and performance by focusing queries on specific regions and enabling multiple object predictions per location.
Contribution
It proposes a novel anchor point-based query design and an efficient attention variant, enhancing transformer-based object detection with better accuracy and faster training.
Findings
Achieves 44.2 AP on MSCOCO with 50 epochs
Runs faster than DETR with fewer training epochs
Outperforms previous transformer detectors on benchmark
Abstract
In this paper, we propose a novel query design for the transformer-based object detection. In previous transformer-based detectors, the object queries are a set of learned embeddings. However, each learned embedding does not have an explicit physical meaning and we cannot explain where it will focus on. It is difficult to optimize as the prediction slot of each object query does not have a specific mode. In other words, each object query will not focus on a specific region. To solved these problems, in our query design, object queries are based on anchor points, which are widely used in CNN-based detectors. So each object query focuses on the objects near the anchor point. Moreover, our query design can predict multiple objects at one position to solve the difficulty: "one region, multiple objects". In addition, we design an attention variant, which can reduce the memory cost while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Brain Tumor Detection and Classification
MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Layer Normalization · Label Smoothing · Adam · Residual Connection · Multi-Head Attention · Softmax
