Learning Hierarchical Prompt with Structured Linguistic Knowledge for   Vision-Language Models

Yubin Wang; Xinyang Jiang; De Cheng; Dongsheng Li; Cairong Zhao

arXiv:2312.06323·cs.CV·December 12, 2023·1 cites

Learning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models

Yubin Wang, Xinyang Jiang, De Cheng, Dongsheng Li, Cairong Zhao

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces Hierarchical Prompt Tuning (HPT), a novel method that leverages structured linguistic knowledge and hierarchical modeling to improve prompt learning for vision-language models, outperforming existing methods.

Contribution

It proposes a hierarchical prompt tuning framework that models both structured and conventional knowledge using a relationship-guided attention module and multi-level prompts.

Findings

01

HPT outperforms state-of-the-art methods in experiments.

02

HPT effectively models complex entity-attribute relationships.

03

HPT demonstrates strong generalization across tasks.

Abstract

Prompt learning has become a prevalent strategy for adapting vision-language foundation models to downstream tasks. As large language models (LLMs) have emerged, recent studies have explored the use of category-related descriptions as input to enhance prompt effectiveness. Nevertheless, conventional descriptions fall short of structured information that effectively represents the interconnections among entities or attributes linked to a particular category. To address this limitation and prioritize harnessing structured knowledge, this paper advocates for leveraging LLMs to build a graph for each description to model the entities and attributes describing the category, as well as their correlations. Preexisting prompt tuning methods exhibit inadequacies in managing this structured knowledge. Consequently, we propose a novel approach called Hierarchical Prompt Tuning (HPT), which enables…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Learning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning