Generalizing Vision-Language Models with Dedicated Prompt Guidance

Xinyao Li; Yinjie Min; Hongbo Chen; Zhekai Du; Fengling Li; Jingjing Li

arXiv:2512.02421·cs.CV·March 13, 2026

Generalizing Vision-Language Models with Dedicated Prompt Guidance

Xinyao Li, Yinjie Min, Hongbo Chen, Zhekai Du, Fengling Li, Jingjing Li

PDF

Open Access

TL;DR

This paper introduces GuiDG, a novel framework that enhances the domain generalization of vision-language models by using prompt-guided expert models and adaptive integration, supported by theoretical insights and extensive experiments.

Contribution

It provides a theoretical analysis of VLM fine-tuning for DG and proposes a two-step prompt-guided expert framework that improves generalization performance.

Findings

01

GuiDG outperforms state-of-the-art fine-tuning methods on DG benchmarks.

02

Theoretical analysis shows expert models on partitioned domains generalize better.

03

Constructed ImageNet-DG for comprehensive few-shot DG evaluation.

Abstract

Fine-tuning large pretrained vision-language models (VLMs) has emerged as a prevalent paradigm for downstream adaptation, yet it faces a critical trade-off between domain specificity and domain generalization (DG) ability. Current methods typically fine-tune a universal model on the entire dataset, which potentially compromises the ability to generalize to unseen domains. To fill this gap, we provide a theoretical understanding of the generalization ability for VLM fine-tuning, which reveals that training multiple parameter-efficient expert models on partitioned source domains leads to better generalization than fine-tuning a universal model. Inspired by this finding, we propose a two-step domain-expert-Guided DG (GuiDG) framework. GuiDG first employs prompt tuning to obtain source domain experts, then introduces a Cross-Modal Attention module to guide the fine-tuning of the vision…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis