Separable Self and Mixed Attention Transformers for Efficient Object Tracking
Goutam Yelluru Gopal, Maria A. Amer

TL;DR
This paper introduces SMAT, a lightweight transformer-based tracker utilizing separable self and mixed attention mechanisms, achieving superior performance and efficiency on multiple benchmarks.
Contribution
It presents the first lightweight tracker combining a transformer backbone and head with separable mixed attention for efficient object tracking.
Findings
Surpasses state-of-the-art lightweight trackers on multiple datasets.
Operates at 37 fps on CPU and 158 fps on GPU with 3.8M parameters.
Achieves 7.9% and 5.8% higher AO than related trackers on GOT10k-test.
Abstract
The deployment of transformers for visual object tracking has shown state-of-the-art results on several benchmarks. However, the transformer-based models are under-utilized for Siamese lightweight tracking due to the computational complexity of their attention blocks. This paper proposes an efficient self and mixed attention transformer-based architecture for lightweight tracking. The proposed backbone utilizes the separable mixed attention transformers to fuse the template and search regions during feature extraction to generate superior feature encoding. Our prediction head performs global contextual modeling of the encoded features by leveraging efficient self-attention blocks for robust target state estimation. With these contributions, the proposed lightweight tracker deploys a transformer-based backbone and head module concurrently for the first time. Our ablation study testifies…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Separable Self and Mixed Attention Transformers for Efficient Object Tracking· youtube
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Visual Attention and Saliency Detection · Infrared Target Detection Methodologies
MethodsArtemisinin Optimization based on Malaria Therapy: Algorithm and Applications to Medical Image Segmentation
