CADTrack: Learning Contextual Aggregation with Deformable Alignment for Robust RGBT Tracking

Hao Li; Yuhao Wang; Xiantao Hu; Wenning Hao; Pingping Zhang; Dong Wang; Huchuan Lu

arXiv:2511.17967·cs.CV·November 25, 2025

CADTrack: Learning Contextual Aggregation with Deformable Alignment for Robust RGBT Tracking

Hao Li, Yuhao Wang, Xiantao Hu, Wenning Hao, Pingping Zhang, Dong Wang, Huchuan Lu

PDF

Open Access 1 Video

TL;DR

CADTrack introduces a novel framework for RGBT tracking that employs deformable alignment and contextual aggregation to effectively handle modality discrepancies and improve tracking robustness and accuracy.

Contribution

The paper presents a new RGBT tracking framework with a feature interaction module, a contextual aggregation module, and a deformable alignment module, enhancing cross-modal fusion and spatial alignment.

Findings

01

Outperforms existing RGBT trackers on five benchmarks.

02

Reduces computational complexity with linear feature interaction.

03

Achieves robust tracking in complex scenarios.

Abstract

RGB-Thermal (RGBT) tracking aims to exploit visible and thermal infrared modalities for robust all-weather object tracking. However, existing RGBT trackers struggle to resolve modality discrepancies, which poses great challenges for robust feature representation. This limitation hinders effective cross-modal information propagation and fusion, which significantly reduces the tracking accuracy. To address this limitation, we propose a novel Contextual Aggregation with Deformable Alignment framework called CADTrack for RGBT Tracking. To be specific, we first deploy the Mamba-based Feature Interaction (MFI) that establishes efficient feature interaction via state space models. This interaction module can operate with linear complexity, reducing computational cost and improving feature discrimination. Then, we propose the Contextual Aggregation Module (CAM) that dynamically activates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

CADTrack: Learning Contextual Aggregation with Deformable Alignment for Robust RGBT Tracking· underline

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Human Pose and Action Recognition · Gaze Tracking and Assistive Technology