RT-RMOT: A Dataset and Framework for RGB-Thermal Referring Multi-Object Tracking
Yanqiu Yu, Zhifan Jin, Sijia Chen, Tongfei Chu, En Yu, Liman Liu, Wenbing Tao

TL;DR
This paper introduces RT-RMOT, a new RGB-Thermal referring multi-object tracking task, along with a dedicated dataset and a multimodal large language model-based framework, achieving robust all-day tracking in challenging conditions.
Contribution
It presents the first RGB-Thermal RMOT dataset, RefRT, and a novel RTrack framework with advanced training strategies for improved multi-object tracking.
Findings
RTrack outperforms existing methods on RefRT dataset.
The proposed strategies enhance training stability and tracking accuracy.
RT-RMOT enables effective all-day multi-object tracking in low-visibility scenarios.
Abstract
Referring Multi-Object Tracking has attracted increasing attention due to its human-friendly interactive characteristics, yet it exhibits limitations in low-visibility conditions, such as nighttime, smoke, and other challenging scenarios. To overcome this limitation, we propose a new RGB-Thermal RMOT task, named RT-RMOT, which aims to fuse RGB appearance features with the illumination robustness of the thermal modality to enable all-day referring multi-object tracking. To promote research on RT-RMOT, we construct the first Referring Multi-Object Tracking dataset under RGB-Thermal modality, named RefRT. It contains 388 language descriptions, 1,250 tracked targets, and 166,147 Language-RGB-Thermal (L-RGB-T) triplets. Furthermore, we propose RTrack, a framework built upon a multimodal large language model (MLLM) that integrates RGB, thermal, and textual features. Since the initial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Video Surveillance and Tracking Methods · Gaze Tracking and Assistive Technology
