Latent Domain Prompt Learning for Vision-Language Models

Zhixing Li; Arsham Gholamzadeh Khoee; Yinan Yu

arXiv:2511.00067·cs.LG·February 2, 2026

Latent Domain Prompt Learning for Vision-Language Models

Zhixing Li, Arsham Gholamzadeh Khoee, Yinan Yu

PDF

Open Access

TL;DR

This paper introduces a novel approach for domain generalization in vision-language models by automatically discovering latent domains through clustering, enabling better adaptation to unseen domains without relying on explicit domain labels.

Contribution

The paper proposes a latent domain prompt learning method that automatically identifies and leverages latent domains to improve robustness of vision-language models under domain shift.

Findings

01

Consistent performance improvements over baselines on four benchmarks.

02

Effective latent domain clustering enhances model robustness.

03

Provides new insights into domain generalization without explicit labels.

Abstract

The objective of domain generalization (DG) is to enable models to be robust against domain shift. DG is crucial for deploying vision-language models (VLMs) in real-world applications, yet most existing methods rely on domain labels that may not be available and often ambiguous. We instead study the DG setting where models must generalize well without access to explicit domain labels. Our key idea is to represent an unseen target domain as a combination of latent domains automatically discovered from training data, enabling the model to adaptively transfer knowledge across domains. To realize this, we perform latent domain clustering on image features and fuse domain-specific text features based on the similarity between the input image and each latent domain. Experiments on four benchmarks show that this strategy yields consistent gains over VLM-based baselines and provides new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Face recognition and analysis