Learning Progressive Adaptation for Multi-Modal Tracking
He Wang, Tianyang Xu, Zhangyong Tang, Xiao-Jun Wu, Josef Kittler

TL;DR
This paper introduces PATrack, a progressive adaptation framework for multi-modal tracking that enhances modality-specific features, inter-modal interactions, and prediction head adaptation, leading to improved tracking performance across various multi-modal datasets.
Contribution
The paper proposes a novel progressive adaptation approach with modality-dependent, modality-entangled, and task-level adapters for effective multi-modal tracking.
Findings
Outperforms state-of-the-art methods on RGB+Thermal, RGB+Depth, and RGB+Event tracking tasks.
Effectively enhances intra-modal and inter-modal feature representations.
Demonstrates robustness and adaptability across diverse multi-modal tracking scenarios.
Abstract
Due to the limited availability of paired multi-modal data, multi-modal trackers are typically built by adopting pre-trained RGB models with parameter-efficient fine-tuning modules. However, these fine-tuning methods overlook advanced adaptations for applying RGB pre-trained models and fail to modulate a single specific modality, cross-modal interactions, and the prediction head. To address the issues, we propose to perform Progressive Adaptation for Multi-Modal Tracking (PATrack). This innovative approach incorporates modality-dependent, modality-entangled, and task-level adapters, effectively bridging the gap in adapting RGB pre-trained networks to multi-modal data through a progressive strategy. Specifically, modality-specific information is enhanced through the modality-dependent adapter, decomposing the high- and low-frequency components, which ensures a more robust feature…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Gaze Tracking and Assistive Technology · Face recognition and analysis
