SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking
Xiaojun Hou, Jiazheng Xing, Yijie Qian, Yaowei Guo, Shuo Xin, Junhao, Chen, Kai Tang, Mengmeng Wang, Zhengkai Jiang, Liang Liu, Yong Liu

TL;DR
SDSTrack introduces a symmetric multimodal tracking framework with lightweight adaptation and masked patch distillation, significantly improving robustness and performance across various multimodal tracking scenarios and challenging environments.
Contribution
The paper proposes a novel symmetric multimodal tracking framework with efficient adaptation and a distillation strategy, addressing modality gaps and enhancing robustness in complex conditions.
Findings
Outperforms state-of-the-art in RGB+Depth, RGB+Thermal, RGB+Event tracking
Effective in extreme weather and sensor failure scenarios
Achieves superior robustness and accuracy in multimodal tracking
Abstract
Multimodal Visual Object Tracking (VOT) has recently gained significant attention due to its robustness. Early research focused on fully fine-tuning RGB-based trackers, which was inefficient and lacked generalized representation due to the scarcity of multimodal data. Therefore, recent studies have utilized prompt tuning to transfer pre-trained RGB-based trackers to multimodal data. However, the modality gap limits pre-trained knowledge recall, and the dominance of the RGB modality persists, preventing the full utilization of information from other modalities. To address these issues, we propose a novel symmetric multimodal tracking framework called SDSTrack. We introduce lightweight adaptation for efficient fine-tuning, which directly transfers the feature extraction ability from RGB to other domains with a small number of trainable parameters and integrates multimodal features in a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Air Quality Monitoring and Forecasting · Fire Detection and Safety Systems
