Gram-Anchored Prompt Learning for Vision-Language Models via Second-Order Statistics
Minglei Chen, Weilong Wang, Jiang Duan, and Ye Deng

TL;DR
This paper introduces Gram-Anchored Prompt Learning (GAPL), a method that enhances vision-language model adaptation by incorporating second-order statistical features, improving robustness across diverse domains.
Contribution
The paper proposes a novel GAPL framework that integrates second-order Gram matrix features into prompt learning for better domain adaptation of VLMs.
Findings
GAPL outperforms existing methods on multiple benchmarks.
Second-order features improve robustness to domain shifts.
Experimental results validate the effectiveness of the proposed approach.
Abstract
Parameter-efficient prompt learning has become the de facto standard for adapting Vision-Language Models (VLMs) to downstream tasks. Existing approaches predominantly focus on aligning text prompts with first-order visual features (i.e., spatial feature maps). While effective for fine-grained semantic discrimination, we argue that relying solely on first-order information is insufficient for robust adaptation, as these spatially entangled features are highly susceptible to domain shifts and local noise. In this work, we propose \textbf{Gram-Anchored Prompt Learning (GAPL)} for Vision-Language Models via Second-Order Statistics, a framework that synergizes local semantic alignment with global structural consistency. Methodologically, we introduce an additional second-order statistical stream via \textbf{Gram matrices} that augments the standard first-order spatial interaction. By…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
