Efficient Training for Visual Tracking with Deformable Transformer
Qingmao Wei, Guotian Zeng, Bi Zeng

TL;DR
This paper introduces DETRack, an efficient end-to-end visual tracking framework using deformable transformers, which reduces training time and inference computational cost while maintaining high accuracy.
Contribution
The paper proposes a novel, resource-efficient transformer-based tracking model with a new label assignment and denoising technique for faster convergence.
Findings
Achieves 72.9% AO on GOT-10k with only 20% training epochs
Reduces GFLOPs compared to existing transformer trackers
Maintains high tracking accuracy with lower computational cost
Abstract
Recent Transformer-based visual tracking models have showcased superior performance. Nevertheless, prior works have been resource-intensive, requiring prolonged GPU training hours and incurring high GFLOPs during inference due to inefficient training methods and convolution-based target heads. This intensive resource use renders them unsuitable for real-world applications. In this paper, we present DETRack, a streamlined end-to-end visual object tracking framework. Our framework utilizes an efficient encoder-decoder structure where the deformable transformer decoder acting as a target head, achieves higher sparsity than traditional convolution heads, resulting in decreased GFLOPs. For training, we introduce a novel one-to-many label assignment and an auxiliary denoising technique, significantly accelerating model's convergence. Comprehensive experiments affirm the effectiveness and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Visual Attention and Saliency Detection · Gaze Tracking and Assistive Technology
MethodsArtemisinin Optimization based on Malaria Therapy: Algorithm and Applications to Medical Image Segmentation · Convolution
