Efficient Decoder-free Object Detection with Transformers

Peixian Chen; Mengdan Zhang; Yunhang Shen; Kekai Sheng; Yuting Gao,; Xing Sun; Ke Li; Chunhua Shen

arXiv:2206.06829·cs.CV·June 20, 2022

Efficient Decoder-free Object Detection with Transformers

Peixian Chen, Mengdan Zhang, Yunhang Shen, Kekai Sheng, Yuting Gao,, Xing Sun, Ke Li, Chunhua Shen

PDF

Open Access 2 Repos

TL;DR

This paper introduces a novel decoder-free transformer-based object detector that simplifies the detection process, achieves high efficiency, and outperforms existing models in accuracy and computational cost on the MS COCO benchmark.

Contribution

The paper proposes a decoder-free, encoder-only transformer architecture for object detection, reducing training time and computational cost while maintaining high accuracy.

Findings

01

Outperforms DETR by 2.5% AP with 28% less computation and over 10x fewer training epochs.

02

Achieves over 5.5% AP gain compared to RetinaNet while reducing 70% of computation.

03

Demonstrates high efficiency and accuracy on the MS COCO benchmark.

Abstract

Vision transformers (ViTs) are changing the landscape of object detection approaches. A natural usage of ViTs in detection is to replace the CNN-based backbone with a transformer-based backbone, which is straightforward and effective, with the price of bringing considerable computation burden for inference. More subtle usage is the DETR family, which eliminates the need for many hand-designed components in object detection but introduces a decoder demanding an extra-long time to converge. As a result, transformer-based object detection can not prevail in large-scale applications. To overcome these issues, we propose a novel decoder-free fully transformer-based (DFFT) object detector, achieving high efficiency in both training and inference stages, for the first time. We simplify objection detection into an encoder-only single-level anchor-based dense prediction problem by centering…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Visual Attention and Saliency Detection

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Feature Pyramid Network · Label Smoothing · Softmax · Absolute Position Encodings · Dropout · Adam · Residual Connection