Multi-Granularity Language-Guided Training for Multi-Object Tracking
Yuhao Li, Jiale Cao, Muzammal Naseer, Yu Zhu, Jinqiu Sun, Yanning, Zhang, Fahad Shahbaz Khan

TL;DR
This paper introduces LG-MOT, a multi-object tracking framework that integrates multi-granularity language information with visual features to improve robustness and achieve state-of-the-art results across multiple benchmarks.
Contribution
The work proposes a novel multi-modal approach that leverages scene- and instance-level language descriptions to enhance visual feature discrimination in multi-object tracking.
Findings
Achieves state-of-the-art performance on MOT17, DanceTrack, and SportsMOT datasets.
Improves target association accuracy by 2.2% IDF1 score on DanceTrack.
Demonstrates strong cross-domain generalizability of the proposed method.
Abstract
Most existing multi-object tracking methods typically learn visual tracking features via maximizing dis-similarities of different instances and minimizing similarities of the same instance. While such a feature learning scheme achieves promising performance, learning discriminative features solely based on visual information is challenging especially in case of environmental interference such as occlusion, blur and domain variance. In this work, we argue that multi-modal language-driven features provide complementary information to classical visual features, thereby aiding in improving the robustness to such environmental interference. To this end, we propose a new multi-object tracking framework, named LG-MOT, that explicitly leverages language information at different levels of granularity (scene-and instance-level) and combines it with standard visual features to obtain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Video Surveillance and Tracking Methods
