Unified Transformer Tracker for Object Tracking
Fan Ma, Mike Zheng Shou, Linchao Zhu, Haoqi Fan, Yilei Xu, Yi Yang,, Zhicheng Yan

TL;DR
The paper introduces a Unified Transformer Tracker (UTT) that effectively handles both Single Object Tracking and Multiple Object Tracking within a single framework, leveraging large-scale datasets for improved performance.
Contribution
It proposes a novel unified transformer-based model that can be trained jointly on SOT and MOT datasets, unifying tracking tasks in a single paradigm.
Findings
Achieves competitive results on multiple benchmarks.
Effectively trains on combined SOT and MOT datasets.
Demonstrates versatility in different tracking scenarios.
Abstract
As an important area in computer vision, object tracking has formed two separate communities that respectively study Single Object Tracking (SOT) and Multiple Object Tracking (MOT). However, current methods in one tracking scenario are not easily adapted to the other due to the divergent training datasets and tracking objects of both tasks. Although UniTrack \cite{wang2021different} demonstrates that a shared appearance model with multiple heads can be used to tackle individual tracking tasks, it fails to exploit the large-scale tracking datasets for training and performs poorly on single object tracking. In this work, we present the Unified Transformer Tracker (UTT) to address tracking problems in different scenarios with one paradigm. A track transformer is developed in our UTT to track the target in both SOT and MOT. The correlation between the target and tracking frame features is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Gaze Tracking and Assistive Technology · Infrared Target Detection Methodologies
MethodsAttention Is All You Need · Linear Layer · Residual Connection · Softmax · Dropout · Position-Wise Feed-Forward Layer · Dense Connections · Byte Pair Encoding · Label Smoothing · Multi-Head Attention
