Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models

Baoshuo Kan; Teng Wang; Wenpeng Lu; Xiantong Zhen; Weili Guan; Feng; Zheng

arXiv:2308.11186·cs.CV·August 23, 2023·2 cites

Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models

Baoshuo Kan, Teng Wang, Wenpeng Lu, Xiantong Zhen, Weili Guan, Feng, Zheng

PDF

Open Access 1 Video

TL;DR

This paper introduces a Knowledge-Aware Prompt Tuning framework for vision-language models that incorporates external knowledge to improve generalization to unseen classes, especially in few-shot image classification tasks.

Contribution

The paper proposes a novel knowledge-aware prompt tuning method that leverages external knowledge and visual cues to enhance model generalization to unseen categories.

Findings

01

Significant improvement in unseen class generalization

02

Achieves 3.22% absolute gain over state-of-the-art on new classes

03

Effective across 11 benchmark datasets

Abstract

Pre-trained vision-language models, e.g., CLIP, working with manually designed prompts have demonstrated great capacity of transfer learning. Recently, learnable prompts achieve state-of-the-art performance, which however are prone to overfit to seen classes, failing to generalize to unseen classes. In this paper, we propose a Knowledge-Aware Prompt Tuning (KAPT) framework for vision-language models. Our approach takes inspiration from human intelligence in which external knowledge is usually incorporated into recognizing novel categories of objects. Specifically, we design two complementary types of knowledge-aware prompts for the text encoder to leverage the distinctive characteristics of category-related external knowledge. The discrete prompt extracts the key information from descriptions of an object category, and the learned continuous prompt captures overall contexts. We further…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models· youtube

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques

MethodsContrastive Language-Image Pre-training