Relation DETR: Exploring Explicit Position Relation Prior for Object Detection
Xiuquan Hou, Meiqin Liu, Senlin Zhang, Ping Wei, Badong Chen, Xuguang, Lan

TL;DR
Relation DETR introduces an explicit position relation prior as attention bias in DETR, significantly improving convergence speed and detection accuracy by constructing position relation embeddings and extending the traditional pipeline.
Contribution
The paper proposes a novel position relation prior for DETR, enhancing convergence and performance, and introduces a universal relation encoder as a plug-in component for various DETR-like methods.
Findings
Achieves +2.0% AP over DINO on COCO
State-of-the-art 51.7% AP for 1x training
Over 40% faster convergence with only 2 epochs
Abstract
This paper presents a general scheme for enhancing the convergence and performance of DETR (DEtection TRansformer). We investigate the slow convergence problem in transformers from a new perspective, suggesting that it arises from the self-attention that introduces no structural bias over inputs. To address this issue, we explore incorporating position relation prior as attention bias to augment object detection, following the verification of its statistical significance using a proposed quantitative macroscopic correlation (MC) metric. Our approach, termed Relation-DETR, introduces an encoder to construct position relation embeddings for progressive attention refinement, which further extends the traditional streaming pipeline of DETR into a contrastive relation pipeline to address the conflicts between non-duplicate predictions and positive supervision. Extensive experiments on both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques
MethodsAttention Is All You Need · Residual Connection · Adam · Dropout · Byte Pair Encoding · Layer Normalization · Label Smoothing · Linear Layer · Softmax · Convolution
