Optimization of Prompt Learning via Multi-Knowledge Representation for Vision-Language Models

Enming Zhang; Bingke Zhu; Yingying Chen; Qinghai Miao; Ming Tang; Jinqiao Wang

arXiv:2404.10357·cs.CV·August 19, 2025·1 cites

Optimization of Prompt Learning via Multi-Knowledge Representation for Vision-Language Models

Enming Zhang, Bingke Zhu, Yingying Chen, Qinghai Miao, Ming Tang, Jinqiao Wang

PDF

Open Access

TL;DR

This paper introduces CoKnow, a framework that enhances prompt learning in vision-language models by integrating multi-knowledge representations, leading to improved performance across multiple datasets.

Contribution

The paper proposes a novel CoKnow framework that enriches prompt tuning with diverse contextual knowledge using lightweight semantic mappers, addressing prompt diversity limitations.

Findings

01

CoKnow outperforms previous methods on 11 datasets.

02

Lightweight semantic mappers effectively generate multi-knowledge representations.

03

Enhanced prompt diversity improves downstream task accuracy.

Abstract

Vision-Language Models (VLMs), such as CLIP, play a foundational role in various cross-modal applications. To fully leverage VLMs' potential in adapting to downstream tasks, context optimization methods like Prompt Tuning are essential. However, one key limitation is the lack of diversity in prompt templates, whether they are hand-crafted or learned through additional modules. This limitation restricts the capabilities of pretrained VLMs and can result in incorrect predictions in downstream tasks. To address this challenge, we propose Context Optimization with Multi-Knowledge Representation (CoKnow), a framework that enhances Prompt Learning for VLMs with rich contextual knowledge. To facilitate CoKnow during inference, we trained lightweight semantic knowledge mappers, which are capable of generating Multi-Knowledge Representation for an input image without requiring additional priors.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications

MethodsContrastive Language-Image Pre-training