Leveraging the Power of Data Augmentation for Transformer-based Tracking
Jie Zhao, Johan Edstedt, Michael Felsberg, Dong Wang, Huchuan Lu

TL;DR
This paper investigates the role of data augmentation in transformer-based visual tracking, revealing limitations of common strategies and proposing two novel, customized augmentation methods that improve performance and data efficiency.
Contribution
It introduces two new data augmentation techniques tailored for transformer-based tracking, enhancing robustness and effectiveness in challenging scenarios.
Findings
Common augmentations have limited impact on transformer trackers.
Proposed dynamic cropping improves boundary sample training.
Token-level feature mixing enhances model robustness against background interference.
Abstract
Due to long-distance correlation and powerful pretrained models, transformer-based methods have initiated a breakthrough in visual object tracking performance. Previous works focus on designing effective architectures suited for tracking, but ignore that data augmentation is equally crucial for training a well-performing model. In this paper, we first explore the impact of general data augmentations on transformer-based trackers via systematic experiments, and reveal the limited effectiveness of these common strategies. Motivated by experimental observations, we then propose two data augmentation methods customized for tracking. First, we optimize existing random cropping via a dynamic search radius mechanism and simulation for boundary samples. Second, we propose a token-level feature mixing augmentation strategy, which enables the model against challenges like background interference.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Leveraging the Power of Data Augmentation for Transformer-Based Tracking· youtube
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Impact of Light on Environment and Health · Human Mobility and Location-Based Analysis
MethodsFocus
