Beyond SOT: Tracking Multiple Generic Objects at Once
Christoph Mayer, Martin Danelljan, Ming-Hsuan Yang, Vittorio, Ferrari, Luc Van Gool, Alina Kuznetsova

TL;DR
This paper introduces LaGOT, a large-scale benchmark for multi-object generic object tracking, and proposes a transformer-based tracker that improves efficiency and performance over existing methods.
Contribution
The paper presents LaGOT, the first large-scale multi-object GOT benchmark, and a transformer-based tracker that jointly processes multiple objects for improved robustness and speed.
Findings
Achieves 4x faster runtime with 10 objects compared to independent tracking.
Outperforms existing single-object trackers on LaGOT benchmark.
Sets new state-of-the-art on TrackingNet with 84.4% AUC.
Abstract
Generic Object Tracking (GOT) is the problem of tracking target objects, specified by bounding boxes in the first frame of a video. While the task has received much attention in the last decades, researchers have almost exclusively focused on the single object setting. Multi-object GOT benefits from a wider applicability, rendering it more attractive in real-world applications. We attribute the lack of research interest into this problem to the absence of suitable benchmarks. In this work, we introduce a new large-scale GOT benchmark, LaGOT, containing multiple annotated target objects per sequence. Our benchmark allows users to tackle key remaining challenges in GOT, aiming to increase robustness and reduce computation through joint tracking of multiple objects simultaneously. In addition, we propose a transformer-based GOT tracker baseline capable of joint processing of multiple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Beyond SOT: Tracking Multiple Generic Objects at Once· youtube
Taxonomy
TopicsVideo Surveillance and Tracking Methods · UAV Applications and Optimization
