Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual   Tracking

Ning Wang; Wengang Zhou; Jie Wang; Houqaing Li

arXiv:2103.11681·cs.CV·March 25, 2021·52 cites

Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking

Ning Wang, Wengang Zhou, Jie Wang, Houqaing Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces a transformer-based framework that leverages temporal context across video frames to significantly improve the robustness and accuracy of visual object tracking, outperforming existing methods.

Contribution

It designs a novel transformer architecture with separate encoder and decoder branches within a Siamese tracking pipeline, enhancing feature reinforcement and cue propagation for better tracking.

Findings

01

Outperforms current top trackers on benchmark datasets.

02

Sets new state-of-the-art records in visual tracking.

03

End-to-end trainable framework with transformer integration.

Abstract

In video object tracking, there exist rich temporal contexts among successive frames, which have been largely overlooked in existing trackers. In this work, we bridge the individual video frames and explore the temporal contexts across them via a transformer architecture for robust object tracking. Different from classic usage of the transformer in natural language processing tasks, we separate its encoder and decoder into two parallel branches and carefully design them within the Siamese-like tracking pipelines. The transformer encoder promotes the target templates via attention-based feature reinforcement, which benefits the high-quality tracking model generation. The transformer decoder propagates the tracking cues from previous templates to the current frame, which facilitates the object searching process. Our transformer-assisted tracking framework is neat and trained in an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

594422814/TransformerTrack
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · IoT-based Smart Home Systems · Human Pose and Action Recognition