RGBT Tracking via Progressive Fusion Transformer with Dynamically Guided Learning
Yabin Zhu, Chenglong Li, Xiao Wang, Jin Tang, Zhixiang Huang

TL;DR
ProFormer introduces a progressive fusion transformer with dynamic guided learning for robust RGBT tracking, effectively integrating modality-specific and shared features, leading to state-of-the-art results across multiple datasets.
Contribution
The paper proposes a novel Progressive Fusion Transformer with a dynamic guided learning algorithm, improving modality integration and learning efficiency in RGBT tracking.
Findings
Achieves new state-of-the-art performance on RGBT210, RGBT234, LasHeR, and VTUAV datasets.
Effectively activates modality-specific information during fusion.
Enhances branch learning through adaptive guidance, improving overall tracking robustness.
Abstract
Existing Transformer-based RGBT tracking methods either use cross-attention to fuse the two modalities, or use self-attention and cross-attention to model both modality-specific and modality-sharing information. However, the significant appearance gap between modalities limits the feature representation ability of certain modalities during the fusion process. To address this problem, we propose a novel Progressive Fusion Transformer called ProFormer, which progressively integrates single-modality information into the multimodal representation for robust RGBT tracking. In particular, ProFormer first uses a self-attention module to collaboratively extract the multimodal representation, and then uses two cross-attention modules to interact it with the features of the dual modalities respectively. In this way, the modality-specific information can well be activated in the multimodal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Optical Imaging and Spectroscopy Techniques · AI in cancer detection
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Residual Connection · Label Smoothing · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Dropout · Dense Connections
