Bootstrapping Referring Multi-Object Tracking
Yani Zhang, Dongming Wu, Wencheng Han, Xingping Dong

TL;DR
This paper introduces a new task called referring multi-object tracking (RMOT) that uses language cues to track multiple objects over time, along with a benchmark and a Transformer-based model that achieves state-of-the-art results.
Contribution
The paper proposes RMOT as a novel task, creates the Refer-KITTI-V2 benchmark, and develops TempRMOT, a Transformer-based framework for effective multi-object tracking guided by language.
Findings
TempRMOT outperforms previous methods on Refer-KITTI and Refer-KITTI-V2
The semi-automatic labeling pipeline efficiently generates diverse annotations
TempRMOT effectively models long-term spatial-temporal object interactions
Abstract
Referring understanding is a fundamental task that bridges natural language and visual content by localizing objects described in free-form expressions. However, existing works are constrained by limited language expressiveness, lacking the capacity to model object dynamics in spatial numbers and temporal states. To address these limitations, we introduce a new and general referring understanding task, termed referring multi-object tracking (RMOT). Its core idea is to employ a language expression as a semantic cue to guide the prediction of multi-object tracking, comprehensively accounting for variations in object quantity and temporal semantics. Along with RMOT, we introduce a RMOT benchmark named Refer-KITTI-V2, featuring scalable and diverse language expressions. To efficiently generate high-quality annotations covering object dynamics with minimal manual effort, we propose a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTarget Tracking and Data Fusion in Sensor Networks
