Generalizing Vision-Language Models with Dedicated Prompt Guidance
Xinyao Li, Yinjie Min, Hongbo Chen, Zhekai Du, Fengling Li, Jingjing Li

TL;DR
This paper introduces GuiDG, a novel framework that enhances the domain generalization of vision-language models by using prompt-guided expert models and adaptive integration, supported by theoretical insights and extensive experiments.
Contribution
It provides a theoretical analysis of VLM fine-tuning for DG and proposes a two-step prompt-guided expert framework that improves generalization performance.
Findings
GuiDG outperforms state-of-the-art fine-tuning methods on DG benchmarks.
Theoretical analysis shows expert models on partitioned domains generalize better.
Constructed ImageNet-DG for comprehensive few-shot DG evaluation.
Abstract
Fine-tuning large pretrained vision-language models (VLMs) has emerged as a prevalent paradigm for downstream adaptation, yet it faces a critical trade-off between domain specificity and domain generalization (DG) ability. Current methods typically fine-tune a universal model on the entire dataset, which potentially compromises the ability to generalize to unseen domains. To fill this gap, we provide a theoretical understanding of the generalization ability for VLM fine-tuning, which reveals that training multiple parameter-efficient expert models on partitioned source domains leads to better generalization than fine-tuning a universal model. Inspired by this finding, we propose a two-step domain-expert-Guided DG (GuiDG) framework. GuiDG first employs prompt tuning to obtain source domain experts, then introduces a Cross-Modal Attention module to guide the fine-tuning of the vision…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis
