TL;DR
This paper introduces a novel unsupervised cross-modal distillation method that leverages paired RGB-TIR data to improve thermal infrared tracking by transferring knowledge from RGB models.
Contribution
It proposes a generic cross-modal distillation approach that enhances TIR tracking by learning TIR-specific features from RGB models using unlabeled paired data.
Findings
Outperforms baseline tracker with 2.3% Success gain
Effectively learns TIR-specific representations
Demonstrates robustness on multiple datasets
Abstract
The target representation learned by convolutional neural networks plays an important role in Thermal Infrared (TIR) tracking. Currently, most of the top-performing TIR trackers are still employing representations learned by the model trained on the RGB data. However, this representation does not take into account the information in the TIR modality itself, limiting the performance of TIR tracking. To solve this problem, we propose to distill representations of the TIR modality from the RGB modality with Cross-Modal Distillation (CMD) on a large amount of unlabeled paired RGB-TIR data. We take advantage of the two-branch architecture of the baseline tracker, i.e. DiMP, for cross-modal distillation working on two components of the tracker. Specifically, we use one branch as a teacher module to distill the representation learned by the model into the other branch. Benefiting from the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
