Cross Fusion RGB-T Tracking with Bi-directional Adapter
Zhirong Zeng, Xiaotao Liu, Meng Sun, Hongyu Wang, Jing Liu

TL;DR
This paper introduces a novel RGB-T tracking architecture that effectively fuses multi-modal and temporal information using three new modules, achieving state-of-the-art results with minimal additional parameters.
Contribution
The paper presents a new cross fusion RGB-T tracking framework with three innovative modules for enhanced multi-modal and temporal information integration.
Findings
Achieves state-of-the-art performance on RGB-T tracking benchmarks.
Introduces three new modules for cross spatio-temporal information fusion.
Utilizes less than 0.3% additional modal parameters for effective fusion.
Abstract
Many state-of-the-art RGB-T trackers have achieved remarkable results through modality fusion. However, these trackers often either overlook temporal information or fail to fully utilize it, resulting in an ineffective balance between multi-modal and temporal information. To address this issue, we propose a novel Cross Fusion RGB-T Tracking architecture (CFBT) that ensures the full participation of multiple modalities in tracking while dynamically fusing temporal information. The effectiveness of CFBT relies on three newly designed cross spatio-temporal information fusion modules: Cross Spatio-Temporal Augmentation Fusion (CSTAF), Cross Spatio-Temporal Complementarity Fusion (CSTCF), and Dual-Stream Spatio-Temporal Adapter (DSTA). CSTAF employs a cross-attention mechanism to enhance the feature representation of the template comprehensively. CSTCF utilizes complementary information…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Infrared Target Detection Methodologies · Advanced Vision and Imaging
MethodsAdapter
