High-Performance Transformer Tracking
Xin Chen, Bin Yan, Jiawen Zhu, Huchuan Lu, Xiang Ruan, Dong Wang

TL;DR
This paper introduces TransT, a transformer-based tracking method that replaces correlation with attention mechanisms for improved feature fusion, achieving high accuracy on multiple datasets.
Contribution
The paper presents a novel attention-based feature fusion network for tracking, replacing traditional correlation, and extends it with multi-template and IoU prediction for enhanced performance.
Findings
TransT achieves state-of-the-art results on seven datasets.
The attention-based fusion outperforms correlation-based methods.
TransT-M further improves accuracy with multi-template and IoU head.
Abstract
Correlation has a critical role in the tracking field, especially in recent popular Siamese-based trackers. The correlation operation is a simple fusion method that considers the similarity between the template and the search region. However, the correlation operation is a local linear matching process, losing semantic information and easily falling into a local optimum, which may be the bottleneck in designing high-accuracy tracking algorithms. In this work, to determine whether a better feature fusion method exists than correlation, a novel attention-based feature fusion network, inspired by the transformer, is presented. This network effectively combines the template and search region features using attention. Specifically, the proposed method includes an ego-context augment module based on self-attention and a cross-feature augment module based on cross-attention. First, we present…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Human Mobility and Location-Based Analysis · Air Quality Monitoring and Forecasting
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Dense Connections · Byte Pair Encoding · Label Smoothing · Absolute Position Encodings · Layer Normalization · Residual Connection
