Gram-Anchored Prompt Learning for Vision-Language Models via Second-Order Statistics

Minglei Chen; Weilong Wang; Jiang Duan; and Ye Deng

arXiv:2604.03980·cs.CV·April 7, 2026

Gram-Anchored Prompt Learning for Vision-Language Models via Second-Order Statistics

Minglei Chen, Weilong Wang, Jiang Duan, and Ye Deng

PDF

TL;DR

This paper introduces Gram-Anchored Prompt Learning (GAPL), a method that enhances vision-language model adaptation by incorporating second-order statistical features, improving robustness across diverse domains.

Contribution

The paper proposes a novel GAPL framework that integrates second-order Gram matrix features into prompt learning for better domain adaptation of VLMs.

Findings

01

GAPL outperforms existing methods on multiple benchmarks.

02

Second-order features improve robustness to domain shifts.

03

Experimental results validate the effectiveness of the proposed approach.

Abstract

Parameter-efficient prompt learning has become the de facto standard for adapting Vision-Language Models (VLMs) to downstream tasks. Existing approaches predominantly focus on aligning text prompts with first-order visual features (i.e., spatial feature maps). While effective for fine-grained semantic discrimination, we argue that relying solely on first-order information is insufficient for robust adaptation, as these spatially entangled features are highly susceptible to domain shifts and local noise. In this work, we propose \textbf{Gram-Anchored Prompt Learning (GAPL)} for Vision-Language Models via Second-Order Statistics, a framework that synergizes local semantic alignment with global structural consistency. Methodologically, we introduce an additional second-order statistical stream via \textbf{Gram matrices} that augments the standard first-order spatial interaction. By…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.