CADTrack: Learning Contextual Aggregation with Deformable Alignment for Robust RGBT Tracking
Hao Li, Yuhao Wang, Xiantao Hu, Wenning Hao, Pingping Zhang, Dong Wang, Huchuan Lu

TL;DR
CADTrack introduces a novel framework for RGBT tracking that employs deformable alignment and contextual aggregation to effectively handle modality discrepancies and improve tracking robustness and accuracy.
Contribution
The paper presents a new RGBT tracking framework with a feature interaction module, a contextual aggregation module, and a deformable alignment module, enhancing cross-modal fusion and spatial alignment.
Findings
Outperforms existing RGBT trackers on five benchmarks.
Reduces computational complexity with linear feature interaction.
Achieves robust tracking in complex scenarios.
Abstract
RGB-Thermal (RGBT) tracking aims to exploit visible and thermal infrared modalities for robust all-weather object tracking. However, existing RGBT trackers struggle to resolve modality discrepancies, which poses great challenges for robust feature representation. This limitation hinders effective cross-modal information propagation and fusion, which significantly reduces the tracking accuracy. To address this limitation, we propose a novel Contextual Aggregation with Deformable Alignment framework called CADTrack for RGBT Tracking. To be specific, we first deploy the Mamba-based Feature Interaction (MFI) that establishes efficient feature interaction via state space models. This interaction module can operate with linear complexity, reducing computational cost and improving feature discrimination. Then, we propose the Contextual Aggregation Module (CAM) that dynamically activates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Human Pose and Action Recognition · Gaze Tracking and Assistive Technology
