End-to-End Object Detection with Adaptive Clustering Transformer
Minghang Zheng, Peng Gao, Renrui Zhang, Kunchang Li, Xiaogang Wang,, Hongsheng Li, Hao Dong

TL;DR
This paper introduces Adaptive Clustering Transformer (ACT), a novel method that reduces the computational cost of transformer-based object detection by adaptively clustering features, maintaining accuracy while lowering FLOPs.
Contribution
The paper proposes ACT, a new transformer variant that replaces standard self-attention with an efficient clustering-based approach, enabling faster training and inference for high-resolution inputs.
Findings
ACT reduces quadratic complexity to linear with respect to the number of prototypes.
ACT maintains competitive accuracy while significantly lowering FLOPs.
The method is a drop-in replacement for existing self-attention modules.
Abstract
End-to-end Object Detection with Transformer (DETR)proposes to perform object detection with Transformer and achieve comparable performance with two-stage object detection like Faster-RCNN. However, DETR needs huge computational resources for training and inference due to the high-resolution spatial input. In this paper, a novel variant of transformer named Adaptive Clustering Transformer(ACT) has been proposed to reduce the computation cost for high-resolution input. ACT cluster the query features adaptively using Locality Sensitive Hashing (LSH) and ap-proximate the query-key interaction using the prototype-key interaction. ACT can reduce the quadratic O(N2) complexity inside self-attention into O(NK) where K is the number of prototypes in each layer. ACT can be a drop-in module replacing the original self-attention module without any training. ACT achieves a good balance between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Video Surveillance and Tracking Methods
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Convolution · Feedforward Network · Softmax · Residual Connection · Dense Connections · Label Smoothing · Detection Transformer
