Miti-DETR: Object Detection based on Transformers with Mitigatory   Self-Attention Convergence

Wenchi Ma; Tianxiao Zhang; Guanghui Wang

arXiv:2112.13310·cs.CV·December 28, 2021·6 cites

Miti-DETR: Object Detection based on Transformers with Mitigatory Self-Attention Convergence

Wenchi Ma, Tianxiao Zhang, Guanghui Wang

PDF

Open Access 1 Repo

TL;DR

Miti-DETR introduces a residual self-attention mechanism in transformer-based object detection to mitigate rank collapse, improve feature expression, and enhance detection accuracy and convergence speed on COCO dataset.

Contribution

The paper proposes a novel residual self-attention architecture for transformers that prevents rank collapse and improves object detection performance.

Findings

01

Significantly improves detection precision on COCO dataset.

02

Speeds up convergence compared to existing DETR models.

03

Can be easily integrated into other transformer-based models.

Abstract

Object Detection with Transformers (DETR) and related works reach or even surpass the highly-optimized Faster-RCNN baseline with self-attention network architectures. Inspired by the evidence that pure self-attention possesses a strong inductive bias that leads to the transformer losing the expressive power with respect to network depth, we propose a transformer architecture with a mitigatory self-attention mechanism by applying possible direct mapping connections in the transformer architecture to mitigate the rank collapse so as to counteract feature expression loss and enhance the model performance. We apply this proposal in object detection tasks and develop a model named Miti-DETR. Miti-DETR reserves the inputs of each single attention layer to the outputs of that layer so that the "non-attention" information has participated in any attention propagation. The formed residual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wenchima/miti-detr
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · Domain Adaptation and Few-Shot Learning

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings