Do Different Tracking Tasks Require Different Appearance Models?
Zhongdao Wang, Hengshuang Zhao, Ya-Li Li, Shengjin Wang, Philip H.S., Torr, Luca Bertinetto

TL;DR
UniTrack introduces a unified, task-agnostic appearance model capable of addressing multiple object tracking tasks within a single framework, reducing the need for specialized solutions and enabling comprehensive evaluation of self-supervised methods.
Contribution
This work presents UniTrack, a versatile framework with a single appearance model and multiple task-specific heads, unifying various tracking tasks and facilitating broader evaluation of self-supervised models.
Findings
Most tracking tasks can be addressed with UniTrack.
The same appearance model performs competitively across tasks.
The framework enables evaluation of recent self-supervised methods.
Abstract
Tracking objects of interest in a video is one of the most popular and widely applicable problems in computer vision. However, with the years, a Cambrian explosion of use cases and benchmarks has fragmented the problem in a multitude of different experimental setups. As a consequence, the literature has fragmented too, and now novel approaches proposed by the community are usually specialised to fit only one specific setup. To understand to what extent this specialisation is necessary, in this work we present UniTrack, a solution to address five different tasks within the same framework. UniTrack consists of a single and task-agnostic appearance model, which can be learned in a supervised or self-supervised fashion, and multiple ``heads'' that address individual tasks and do not require training. We show how most tracking tasks can be solved within this framework, and that the same…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Face recognition and analysis · Domain Adaptation and Few-Shot Learning
