Simple Cues Lead to a Strong Multi-Object Tracker
Jenny Seidenschwarz, Guillem Bras\'o, Victor Castro Serrano, Ismail, Elezi, and Laura Leal-Taix\'e

TL;DR
This paper demonstrates that simple tracking-by-detection methods, when combined with basic appearance features and motion models, can achieve state-of-the-art multi-object tracking performance across multiple datasets.
Contribution
The authors show that a standard re-identification network combined with simple motion cues can match complex end-to-end models in multi-object tracking.
Findings
Achieves state-of-the-art results on MOT17, MOT20, BDD100k, and DanceTrack datasets.
Simple cues combined with a re-identification network are highly effective.
Analysis of failure cases provides insights for further improvements.
Abstract
For a long time, the most common paradigm in Multi-Object Tracking was tracking-by-detection (TbD), where objects are first detected and then associated over video frames. For association, most models resourced to motion and appearance cues, e.g., re-identification networks. Recent approaches based on attention propose to learn the cues in a data-driven manner, showing impressive results. In this paper, we ask ourselves whether simple good old TbD methods are also capable of achieving the performance of end-to-end models. To this end, we propose two key ingredients that allow a standard re-identification network to excel at appearance-based tracking. We extensively analyse its failure cases, and show that a combination of our appearance features with a simple motion model leads to strong tracking results. Our tracker generalizes to four public datasets, namely MOT17, MOT20, BDD100k, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Face recognition and analysis · Human Pose and Action Recognition
