InPK: Infusing Prior Knowledge into Prompt for Vision-Language Models
Shuchang Zhou, Jiwei Wei, Shiyuan He, Yuyang Zhou, Chaoning Zhang, Jie, Zou, Ning Xie, Yang Yang

TL;DR
InPK enhances vision-language models by infusing class-specific prior knowledge into learnable tokens and reinforcing their interaction across multiple feature levels, leading to improved zero/few-shot recognition performance.
Contribution
The paper introduces InPK, a novel method that incorporates prior knowledge into prompt tuning for VLMs, improving generalization and discriminative ability for unseen classes.
Findings
Outperforms state-of-the-art methods on 11 datasets.
Effectively captures class-specific and universal visual concepts.
Enhances zero/few-shot recognition accuracy.
Abstract
Prompt tuning has become a popular strategy for adapting Vision-Language Models (VLMs) to zero/few-shot visual recognition tasks. Some prompting techniques introduce prior knowledge due to its richness, but when learnable tokens are randomly initialized and disconnected from prior knowledge, they tend to overfit on seen classes and struggle with domain shifts for unseen ones. To address this issue, we propose the InPK model, which infuses class-specific prior knowledge into the learnable tokens during initialization, thus enabling the model to explicitly focus on class-relevant information. Furthermore, to mitigate the weakening of class information by multi-layer encoders, we continuously reinforce the interaction between learnable tokens and prior knowledge across multiple feature levels. This progressive interaction allows the learnable tokens to better capture the fine-grained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling
MethodsFocus
