Rethinking Domain Adaptation and Generalization in the Era of CLIP

Ruoyu Feng; Tao Yu; Xin Jin; Xiaoyuan Yu; Lei Xiao; Zhibo Chen

arXiv:2407.15173·cs.CV·July 23, 2024

Rethinking Domain Adaptation and Generalization in the Era of CLIP

Ruoyu Feng, Tao Yu, Xin Jin, Xiaoyuan Yu, Lei Xiao, Zhibo Chen

PDF

Open Access

TL;DR

This paper explores how CLIP, a large vision-language model, can be adapted and generalized across domains using simple priors, benchmarks, and self-training, challenging traditional domain adaptation approaches.

Contribution

It introduces a new perspective on domain adaptation with CLIP, including a benchmark for zero-shot adaptation and a method for improving generalization across multiple unlabeled domains.

Findings

01

A simple domain prior enhances CLIP's zero-shot recognition.

02

CLIP's adaptation depends less on source data due to diverse pre-training.

03

Proposed methods improve generalization in multi-domain scenarios.

Abstract

In recent studies on domain adaptation, significant emphasis has been placed on the advancement of learning shared knowledge from a source domain to a target domain. Recently, the large vision-language pre-trained model, i.e., CLIP has shown strong ability on zero-shot recognition, and parameter efficient tuning can further improve its performance on specific tasks. This work demonstrates that a simple domain prior boosts CLIP's zero-shot recognition in a specific domain. Besides, CLIP's adaptation relies less on source domain data due to its diverse pre-training dataset. Furthermore, we create a benchmark for zero-shot adaptation and pseudo-labeling based self-training with CLIP. Last but not least, we propose to improve the task generalization ability of CLIP from multiple unlabeled domains, which is a more practical and unique scenario. We believe our findings motivate a rethinking…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsContrastive Language-Image Pre-training