RAGTrack: Language-aware RGBT Tracking with Retrieval-Augmented Generation
Hao Li, Yuhao Wang, Wenning Hao, Pingping Zhang, Dong Wang, Huchuan Lu

TL;DR
RAGTrack introduces a retrieval-augmented generation framework with language guidance and adaptive visual-language modeling to improve robustness and accuracy in RGBT tracking across diverse conditions.
Contribution
The paper proposes RAGTrack, a novel framework integrating language descriptions, a multi-modal transformer, and retrieval-augmented reasoning for enhanced RGBT tracking.
Findings
Achieves state-of-the-art results on four RGBT benchmarks.
Effectively mitigates modality gaps and background distractions.
Demonstrates robustness across challenging scenarios.
Abstract
RGB-Thermal (RGBT) tracking aims to achieve robust object localization across diverse environmental conditions by fusing visible and thermal infrared modalities. However, existing RGBT trackers rely solely on initial-frame visual information for target modeling, failing to adapt to appearance variations due to the absence of language guidance. Furthermore, current methods suffer from redundant search regions and heterogeneous modality gaps, causing background distraction. To address these issues, we first introduce textual descriptions into RGBT tracking benchmarks. This is accomplished through a pipeline that leverages Multi-modal Large Language Models (MLLMs) to automatically produce texual annotations. Afterwards, we propose RAGTrack, a novel Retrieval-Augmented Generation framework for robust RGBT tracking. To this end, we introduce a Multi-modal Transformer Encoder (MTE) for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Video Surveillance and Tracking Methods · Advanced Neural Network Applications
