Prompt Disentanglement via Language Guidance and Representation Alignment for Domain Generalization

De Cheng; Zhipeng Xu; Xinyang Jiang; Dongsheng Li; Nannan Wang; Xinbo Gao

arXiv:2507.02288·cs.CV·July 4, 2025

Prompt Disentanglement via Language Guidance and Representation Alignment for Domain Generalization

De Cheng, Zhipeng Xu, Xinyang Jiang, Dongsheng Li, Nannan Wang, Xinbo Gao

PDF

TL;DR

This paper introduces a novel framework for domain generalization that leverages language-guided disentanglement of visual prompts and representation alignment, improving model robustness across unseen domains.

Contribution

It proposes a text feature-guided visual prompt tuning framework combined with Worst Explicit Representation Alignment (WERA) to enhance domain-invariant features in visual models.

Findings

01

Outperforms state-of-the-art DG methods on multiple datasets

02

Effective disentanglement of text and visual features improves generalization

03

Incorporating stylized augmentations enhances domain diversity and robustness

Abstract

Domain Generalization (DG) seeks to develop a versatile model capable of performing effectively on unseen target domains. Notably, recent advances in pre-trained Visual Foundation Models (VFMs), such as CLIP, have demonstrated considerable potential in enhancing the generalization capabilities of deep learning models. Despite the increasing attention toward VFM-based domain prompt tuning within DG, the effective design of prompts capable of disentangling invariant features across diverse domains remains a critical challenge. In this paper, we propose addressing this challenge by leveraging the controllable and flexible language prompt of the VFM. Noting that the text modality of VFMs is naturally easier to disentangle, we introduce a novel framework for text feature-guided visual prompt tuning. This framework first automatically disentangles the text prompt using a large language model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsContrastive Language-Image Pre-training · Sparse Evolutionary Training