TL;DR
This paper introduces a novel approach for Unsupervised Domain Adaptation using Vision-Language Pre-training models, combining Cross-Modal Knowledge Distillation and Residual Sparse Training to improve performance and reduce storage needs.
Contribution
The paper proposes a new method leveraging VLP models for UDA, introducing CMKD and RST to enhance performance and efficiency over existing techniques.
Findings
Achieves state-of-the-art results on standard benchmarks.
Reduces storage overhead significantly compared to traditional fine-tuning.
Demonstrates the effectiveness of VLP models in UDA tasks.
Abstract
This paper addresses two vital challenges in Unsupervised Domain Adaptation (UDA) with a focus on harnessing the power of Vision-Language Pre-training (VLP) models. Firstly, UDA has primarily relied on ImageNet pre-trained models. However, the potential of VLP models in UDA remains largely unexplored. The rich representation of VLP models holds significant promise for enhancing UDA tasks. To address this, we propose a novel method called Cross-Modal Knowledge Distillation (CMKD), leveraging VLP models as teacher models to guide the learning process in the target domain, resulting in state-of-the-art performance. Secondly, current UDA paradigms involve training separate models for each task, leading to significant storage overhead and impractical model deployment as the number of transfer tasks grows. To overcome this challenge, we introduce Residual Sparse Training (RST) exploiting the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsKnowledge Distillation · Focus · FixMatch
