Cross-Modal Object Tracking: Modality-Aware Representations and A Unified Benchmark
Chenglong Li, Tianhao Zhu, Lei Liu, Xiaonan Si, Zilin Fan, Sulan Zhai

TL;DR
This paper introduces a new cross-modal object tracking dataset and a modality-aware tracking algorithm that effectively handles RGB and NIR data, addressing the challenges of heterogeneous visual properties in surveillance applications.
Contribution
The work provides the first large-scale cross-modal tracking dataset and proposes a novel, flexible modality-aware representation method for improved tracking across RGB and NIR modalities.
Findings
The proposed algorithm outperforms 17 state-of-the-art methods on the new dataset.
The dataset contains 654 sequences with over 481,000 frames, supporting robust evaluation.
The modality-aware approach effectively mitigates appearance gaps between RGB and NIR data.
Abstract
In many visual systems, visual tracking often bases on RGB image sequences, in which some targets are invalid in low-light conditions, and tracking performance is thus affected significantly. Introducing other modalities such as depth and infrared data is an effective way to handle imaging limitations of individual sources, but multi-modal imaging platforms usually require elaborate designs and cannot be applied in many real-world applications at present. Near-infrared (NIR) imaging becomes an essential part of many surveillance cameras, whose imaging is switchable between RGB and NIR based on the light intensity. These two modalities are heterogeneous with very different visual properties and thus bring big challenges for visual tracking. However, existing works have not studied this challenging problem. In this work, we address the cross-modal object tracking problem and contribute a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Visual Attention and Saliency Detection · Gaze Tracking and Assistive Technology
