Sparse DETR: Efficient End-to-End Object Detection with Learnable   Sparsity

Byungseok Roh; JaeWoong Shin; Wuhyun Shin; Saehoon Kim

arXiv:2111.14330·cs.CV·March 7, 2022·91 cites

Sparse DETR: Efficient End-to-End Object Detection with Learnable Sparsity

Byungseok Roh, JaeWoong Shin, Wuhyun Shin, Saehoon Kim

PDF

Open Access 1 Repo 1 Video

TL;DR

Sparse DETR introduces a selective token update mechanism in transformer-based object detection, significantly reducing computation and increasing speed while maintaining or improving detection performance.

Contribution

It proposes a novel sparse token updating strategy in transformer encoders for object detection, improving efficiency without sacrificing accuracy.

Findings

01

Achieves better performance than Deformable DETR with only 10% encoder tokens.

02

Reduces total computation cost by 38%.

03

Increases FPS by 42%.

Abstract

DETR is the first end-to-end object detector using a transformer encoder-decoder architecture and demonstrates competitive performance but low computational efficiency on high resolution feature maps. The subsequent work, Deformable DETR, enhances the efficiency of DETR by replacing dense attention with deformable attention, which achieves 10x faster convergence and improved performance. Deformable DETR uses the multiscale feature to ameliorate performance, however, the number of encoder tokens increases by 20x compared to DETR, and the computation cost of the encoder attention remains a bottleneck. In our preliminary experiment, we observe that the detection performance hardly deteriorates even if only a part of the encoder token is updated. Inspired by this observation, we propose Sparse DETR that selectively updates only the tokens expected to be referenced by the decoder, thus help…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kakaobrain/sparse-detr
pytorchOfficial

Videos

Sparse DETR: Efficient End-to-End Object Detection with Learnable Sparsity· slideslive

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Deformable Attention Module · Label Smoothing · Softmax · Convolution · Residual Connection · Feedforward Network