Domain-Invariant Prompt Learning for Vision-Language Models
Arsham Gholamzadeh Khoee, Yinan Yu, and Robert Feldt

TL;DR
This paper introduces DiCoOp, a domain-invariant prompt learning method for vision-language models that improves zero-shot recognition across unseen domains through adversarial training.
Contribution
It extends CoOp with adversarial training to explicitly learn domain-invariant prompts, enhancing domain generalization in vision-language models.
Findings
DiCoOp outperforms CoOp in cross-domain recognition tasks.
Adversarial training effectively enforces domain invariance.
Experimental results demonstrate improved robustness across diverse visual domains.
Abstract
Large pre-trained vision-language models like CLIP have transformed computer vision by aligning images and text in a shared feature space, enabling robust zero-shot transfer via prompting. Soft-prompting, such as Context Optimization (CoOp), effectively adapts these models for downstream recognition tasks by learning a set of context vectors. However, CoOp lacks explicit mechanisms for handling domain shifts across unseen distributions. To address this, we propose Domain-invariant Context Optimization (DiCoOp), an extension of CoOp optimized for domain generalization. By employing an adversarial training approach, DiCoOp forces the model to learn domain-invariant prompts while preserving discriminative power for classification. Experimental results show that DiCoOp consistently surpasses CoOp in domain generalization tasks across diverse visual domains.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
