InPK: Infusing Prior Knowledge into Prompt for Vision-Language Models

Shuchang Zhou; Jiwei Wei; Shiyuan He; Yuyang Zhou; Chaoning Zhang; Jie; Zou; Ning Xie; Yang Yang

arXiv:2502.19777·cs.CV·April 1, 2025

InPK: Infusing Prior Knowledge into Prompt for Vision-Language Models

Shuchang Zhou, Jiwei Wei, Shiyuan He, Yuyang Zhou, Chaoning Zhang, Jie, Zou, Ning Xie, Yang Yang

PDF

Open Access

TL;DR

InPK enhances vision-language models by infusing class-specific prior knowledge into learnable tokens and reinforcing their interaction across multiple feature levels, leading to improved zero/few-shot recognition performance.

Contribution

The paper introduces InPK, a novel method that incorporates prior knowledge into prompt tuning for VLMs, improving generalization and discriminative ability for unseen classes.

Findings

01

Outperforms state-of-the-art methods on 11 datasets.

02

Effectively captures class-specific and universal visual concepts.

03

Enhances zero/few-shot recognition accuracy.

Abstract

Prompt tuning has become a popular strategy for adapting Vision-Language Models (VLMs) to zero/few-shot visual recognition tasks. Some prompting techniques introduce prior knowledge due to its richness, but when learnable tokens are randomly initialized and disconnected from prior knowledge, they tend to overfit on seen classes and struggle with domain shifts for unseen ones. To address this issue, we propose the InPK model, which infuses class-specific prior knowledge into the learnable tokens during initialization, thus enabling the model to explicitly focus on class-relevant information. Furthermore, to mitigate the weakening of class information by multi-layer encoders, we continuously reinforce the interaction between learnable tokens and prior knowledge across multiple feature levels. This progressive interaction allows the learnable tokens to better capture the fine-grained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling

MethodsFocus