End-to-End Object Detection with Adaptive Clustering Transformer

Minghang Zheng; Peng Gao; Renrui Zhang; Kunchang Li; Xiaogang Wang,; Hongsheng Li; Hao Dong

arXiv:2011.09315·cs.CV·October 19, 2021·117 cites

End-to-End Object Detection with Adaptive Clustering Transformer

Minghang Zheng, Peng Gao, Renrui Zhang, Kunchang Li, Xiaogang Wang,, Hongsheng Li, Hao Dong

PDF

Open Access 1 Repo

TL;DR

This paper introduces Adaptive Clustering Transformer (ACT), a novel method that reduces the computational cost of transformer-based object detection by adaptively clustering features, maintaining accuracy while lowering FLOPs.

Contribution

The paper proposes ACT, a new transformer variant that replaces standard self-attention with an efficient clustering-based approach, enabling faster training and inference for high-resolution inputs.

Findings

01

ACT reduces quadratic complexity to linear with respect to the number of prototypes.

02

ACT maintains competitive accuracy while significantly lowering FLOPs.

03

The method is a drop-in replacement for existing self-attention modules.

Abstract

End-to-end Object Detection with Transformer (DETR)proposes to perform object detection with Transformer and achieve comparable performance with two-stage object detection like Faster-RCNN. However, DETR needs huge computational resources for training and inference due to the high-resolution spatial input. In this paper, a novel variant of transformer named Adaptive Clustering Transformer(ACT) has been proposed to reduce the computation cost for high-resolution input. ACT cluster the query features adaptively using Locality Sensitive Hashing (LSH) and ap-proximate the query-key interaction using the prototype-key interaction. ACT can reduce the quadratic O(N2) complexity inside self-attention into O(NK) where K is the number of prototypes in each layer. ACT can be a drop-in module replacing the original self-attention module without any training. ACT achieves a good balance between…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gaopengcuhk/SMCA-DETR
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Video Surveillance and Tracking Methods

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Convolution · Feedforward Network · Softmax · Residual Connection · Dense Connections · Label Smoothing · Detection Transformer