ENACT: Entropy-based Clustering of Attention Input for Reducing the Computational Needs of Object Detection Transformers
Giorgos Savathrakis, Antonis Argyros

TL;DR
ENACT introduces entropy-based clustering of transformer inputs to significantly reduce computational resources in object detection transformers with minimal accuracy loss.
Contribution
This work presents a novel plug-in module that clusters attention inputs based on entropy, reducing GPU usage in vision transformers for object detection.
Findings
Memory requirements are reduced during training.
Detection accuracy is only slightly degraded.
Applicable to multiple transformer architectures.
Abstract
Transformers demonstrate competitive performance in terms of precision on the problem of vision-based object detection. However, they require considerable computational resources due to the quadratic size of the attention weights. In this work, we propose to cluster the transformer input on the basis of its entropy, due to its similarity between same object pixels. This is expected to reduce GPU usage during training, while maintaining reasonable accuracy. This idea is realized with an implemented module that is called ENtropy-based Attention Clustering for detection Transformers (ENACT), which serves as a plug-in to any multi-head self-attention based transformer network. Experiments on the COCO object detection dataset and three detection transformers demonstrate that the requirements on memory are reduced, while the detection accuracy is degraded only slightly. The code of the ENACT…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Vision Systems and Defect Detection · Infrared Target Detection Methodologies · Neural Networks and Applications
MethodsSoftmax · Attention Is All You Need
