Source-Free Domain Adaptation Guided by Vision and Vision-Language Pre-Training
Wenyu Zhang, Li Shen, Chuan-Sheng Foo

TL;DR
This paper introduces a flexible framework for source-free domain adaptation that leverages pre-trained vision and vision-language models, improving pseudolabel quality and adaptation performance across various challenging scenarios.
Contribution
It proposes an integrated approach to incorporate pre-trained networks, including CLIP, into the SFDA process, enhancing representation and adaptation capabilities.
Findings
Improved target domain adaptation performance on multiple benchmarks.
Effective integration of CLIP for zero-shot classification in SFDA.
Robustness in open-set, partial-set, and open-partial scenarios.
Abstract
Source-free domain adaptation (SFDA) aims to adapt a source model trained on a fully-labeled source domain to a related but unlabeled target domain. While the source model is a key avenue for acquiring target pseudolabels, the generated pseudolabels may exhibit source bias. In the conventional SFDA pipeline, a large data (e.g. ImageNet) pre-trained feature extractor is used to initialize the source model at the start of source training, and subsequently discarded. Despite having diverse features important for generalization, the pre-trained feature extractor can overfit to the source data distribution during source training and forget relevant target domain knowledge. Rather than discarding this valuable knowledge, we introduce an integrated framework to incorporate pre-trained networks into the target adaptation process. The proposed framework is flexible and allows us to plug modern…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Natural Language Processing Techniques
MethodsContrastive Language-Image Pre-training
