TransTrack: Multiple Object Tracking with Transformer
Peize Sun, Jinkun Cao, Yi Jiang, Rufeng Zhang, Enze Xie, Zehuan Yuan,, Changhu Wang, Ping Luo

TL;DR
TransTrack introduces a transformer-based approach for multiple object tracking that combines detection and association in a single step, achieving competitive results on standard benchmarks.
Contribution
It presents a novel joint-detection-and-tracking paradigm using transformers, simplifying multi-step tracking methods and improving efficiency.
Findings
Achieves 74.5% MOTA on MOT17
Achieves 64.5% MOTA on MOT20
Provides a unified detection and tracking framework
Abstract
In this work, we propose TransTrack, a simple but efficient scheme to solve the multiple object tracking problems. TransTrack leverages the transformer architecture, which is an attention-based query-key mechanism. It applies object features from the previous frame as a query of the current frame and introduces a set of learned object queries to enable detecting new-coming objects. It builds up a novel joint-detection-and-tracking paradigm by accomplishing object detection and object association in a single shot, simplifying complicated multi-step settings in tracking-by-detection methods. On MOT17 and MOT20 benchmark, TransTrack achieves 74.5\% and 64.5\% MOTA, respectively, competitive to the state-of-the-art methods. We expect TransTrack to provide a novel perspective for multiple object tracking. The code is available at: \url{https://github.com/PeizeSun/TransTrack}.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Neural Network Applications · Fire Detection and Safety Systems
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Multi-Head Attention · Dropout · Softmax · Dense Connections · Label Smoothing · Attention Is All You Need
