Tree of Attributes Prompt Learning for Vision-Language Models

Tong Ding; Wanhua Li; Zhongqi Miao; Hanspeter Pfister

arXiv:2410.11201·cs.CV·April 22, 2025

Tree of Attributes Prompt Learning for Vision-Language Models

Tong Ding, Wanhua Li, Zhongqi Miao, Hanspeter Pfister

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces Tree of Attributes Prompt learning (TAP), a method that leverages structured attribute hierarchies generated by LLMs to improve vision-language model adaptation for various classification tasks.

Contribution

TAP distills structured knowledge graphs from LLMs and incorporates explicit visual attribute learning, enhancing zero-shot and few-shot classification performance.

Findings

01

Outperforms state-of-the-art methods on multiple datasets.

02

Improves zero-shot base-to-novel generalization.

03

Enhances cross-dataset transfer and few-shot classification.

Abstract

Prompt learning has proven effective in adapting vision language models for downstream tasks. However, existing methods usually append learnable prompt tokens solely with the category names to obtain textual features, which fails to fully leverage the rich context indicated in the category name. To address this issue, we propose the Tree of Attributes Prompt learning (TAP), which first instructs LLMs to generate a tree of attributes with a "concept - attribute - description" structure for each category, and then learn the hierarchy with vision and text prompt tokens. Unlike existing methods that merely augment category names with a set of unstructured descriptions, our approach essentially distills structured knowledge graphs associated with class names from LLMs. Furthermore, our approach introduces text and vision prompts designed to explicitly learn the corresponding visual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hhenryd/tap
pytorchOfficial

Videos

Tree of Attributes Prompt Learning for Vision-Language Models· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Topic Modeling

MethodsSparse Evolutionary Training