DMTrack: Spatio-Temporal Multimodal Tracking via Dual-Adapter
Weihong Li, Shaohua Dong, Haonan Lu, Yanhao Zhang, Heng Fan, Libo Zhang

TL;DR
DMTrack introduces a dual-adapter architecture for efficient spatio-temporal multimodal tracking, achieving state-of-the-art results with minimal parameters by effectively bridging modality gaps and enhancing cross-modality fusion.
Contribution
The paper proposes a novel dual-adapter architecture with spatio-temporal and progressive modality adapters for improved multimodal tracking.
Findings
Achieves state-of-the-art performance on five benchmarks.
Operates with only 0.93 million trainable parameters.
Effectively bridges modality gaps and enhances fusion.
Abstract
In this paper, we explore adapter tuning and introduce a novel dual-adapter architecture for spatio-temporal multimodal tracking, dubbed DMTrack. The key of our DMTrack lies in two simple yet effective modules, including a spatio-temporal modality adapter (STMA) and a progressive modality complementary adapter (PMCA) module. The former, applied to each modality alone, aims to adjust spatio-temporal features extracted from a frozen backbone by self-prompting, which to some extent can bridge the gap between different modalities and thus allows better cross-modality fusion. The latter seeks to facilitate cross-modality prompting progressively with two specially designed pixel-wise shallow and deep adapters. The shallow adapter employs shared parameters between the two modalities, aiming to bridge the information flow between the two modality branches, thereby laying the foundation for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Gaze Tracking and Assistive Technology · Face recognition and analysis
