Recurrent Glimpse-based Decoder for Detection with Transformer
Zhe Chen, Jing Zhang, Dacheng Tao

TL;DR
This paper introduces REGO, a recurrent glimpse-based decoder that improves DETR object detection by focusing attention on foreground objects through multi-stage processing, significantly reducing training epochs needed for high performance.
Contribution
The paper proposes a novel recurrent glimpse-based decoder (REGO) that enhances DETR detection accuracy and training efficiency by iterative focus on regions of interest.
Findings
REGO achieves 44.8 AP on MSCOCO with only 36 epochs.
REGO boosts DETR performance by up to 7% at 50 epochs.
REGO can be integrated into existing DETR variants without disrupting end-to-end training.
Abstract
Although detection with Transformer (DETR) is increasingly popular, its global attention modeling requires an extremely long training period to optimize and achieve promising detection performance. Alternative to existing studies that mainly develop advanced feature or embedding designs to tackle the training issue, we point out that the Region-of-Interest (RoI) based detection refinement can easily help mitigate the difficulty of training for DETR methods. Based on this, we introduce a novel REcurrent Glimpse-based decOder (REGO) in this paper. In particular, the REGO employs a multi-stage recurrent processing structure to help the attention of DETR gradually focus on foreground objects more accurately. In each processing stage, visual features are extracted as glimpse features from RoIs with enlarged bounding box areas of detection results from the previous stage. Then, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Anomaly Detection Techniques and Applications · Image Enhancement Techniques
MethodsAttention Is All You Need · Linear Layer · Deformable Attention Module · Dropout · Layer Normalization · Label Smoothing · Byte Pair Encoding · Multi-Head Attention · Deformable DETR · Position-Wise Feed-Forward Layer
