Recurrent Glimpse-based Decoder for Detection with Transformer

Zhe Chen; Jing Zhang; Dacheng Tao

arXiv:2112.04632·cs.CV·April 13, 2022

Recurrent Glimpse-based Decoder for Detection with Transformer

Zhe Chen, Jing Zhang, Dacheng Tao

PDF

Open Access 1 Repo

TL;DR

This paper introduces REGO, a recurrent glimpse-based decoder that improves DETR object detection by focusing attention on foreground objects through multi-stage processing, significantly reducing training epochs needed for high performance.

Contribution

The paper proposes a novel recurrent glimpse-based decoder (REGO) that enhances DETR detection accuracy and training efficiency by iterative focus on regions of interest.

Findings

01

REGO achieves 44.8 AP on MSCOCO with only 36 epochs.

02

REGO boosts DETR performance by up to 7% at 50 epochs.

03

REGO can be integrated into existing DETR variants without disrupting end-to-end training.

Abstract

Although detection with Transformer (DETR) is increasingly popular, its global attention modeling requires an extremely long training period to optimize and achieve promising detection performance. Alternative to existing studies that mainly develop advanced feature or embedding designs to tackle the training issue, we point out that the Region-of-Interest (RoI) based detection refinement can easily help mitigate the difficulty of training for DETR methods. Based on this, we introduce a novel REcurrent Glimpse-based decOder (REGO) in this paper. In particular, the REGO employs a multi-stage recurrent processing structure to help the attention of DETR gradually focus on foreground objects more accurately. In each processing stage, visual features are extracted as glimpse features from RoIs with enlarged bounding box areas of detection results from the previous stage. Then, a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhechen/deformable-detr-rego
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Anomaly Detection Techniques and Applications · Image Enhancement Techniques

MethodsAttention Is All You Need · Linear Layer · Deformable Attention Module · Dropout · Layer Normalization · Label Smoothing · Byte Pair Encoding · Multi-Head Attention · Deformable DETR · Position-Wise Feed-Forward Layer