Revisiting RGBT Tracking Benchmarks from the Perspective of Modality Validity: A New Benchmark, Problem, and Solution
Zhangyong Tang, Tianyang Xu, Zhenhua Feng, Xuefeng Zhu, Chunyang Cheng, Xiao-Jun Wu, and Josef Kittler

TL;DR
This paper introduces a new diverse benchmark for RGBT tracking focused on modality validity in challenging scenarios, proposes a fusion strategy addressing the 'when to fuse' problem, and demonstrates state-of-the-art results with a mixture of experts approach.
Contribution
It presents MV-RGBT, a benchmark capturing severe imaging conditions, and proposes MoETrack, a mixture of experts method, to improve tracking in multi-modal scenarios.
Findings
MV-RGBT is the most diverse benchmark for MMW scenarios.
Fusion is not always beneficial in severe imaging conditions.
MoETrack achieves state-of-the-art results on multiple benchmarks.
Abstract
RGBT tracking draws increasing attention because its robustness in multi-modal warranting (MMW) scenarios, such as nighttime and adverse weather conditions, where relying on a single sensing modality fails to ensure stable tracking results. However, existing benchmarks predominantly contain videos collected in common scenarios where both RGB and thermal infrared (TIR) information are of sufficient quality. This weakens the representativeness of existing benchmarks in severe imaging conditions, leading to tracking failures in MMW scenarios. To bridge this gap, we present a new benchmark considering the modality validity, MV-RGBT, captured specifically from MMW scenarios where either RGB (extreme illumination) or TIR (thermal truncation) modality is invalid. Hence, it is further divided into two subsets according to the valid modality, offering a new compositional perspective for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Vision Systems and Defect Detection
