Prompting Large Pre-trained Vision-Language Models For Compositional   Concept Learning

Guangyue Xu; Parisa Kordjamshidi; Joyce Chai

arXiv:2211.05077·cs.CV·November 10, 2022·6 cites

Prompting Large Pre-trained Vision-Language Models For Compositional Concept Learning

Guangyue Xu, Parisa Kordjamshidi, Joyce Chai

PDF

Open Access

TL;DR

This paper introduces PromptCompVL, a prompt-based method that enhances zero-shot compositional learning in vision-language models by using soft prompts and embeddings, achieving state-of-the-art results on benchmark datasets.

Contribution

The paper proposes a novel prompt-based approach with soft prompts and embeddings for compositional zero-shot learning in vision-language models, improving performance over existing methods.

Findings

01

Achieves state-of-the-art results on MIT-States dataset.

02

Demonstrates consistent improvement over other CLIP-based methods.

03

Validates effectiveness of soft prompting strategies for CZSL.

Abstract

This work explores the zero-shot compositional learning ability of large pre-trained vision-language models(VLMs) within the prompt-based learning framework and propose a model (\textit{PromptCompVL}) to solve the compositonal zero-shot learning (CZSL) problem. \textit{PromptCompVL} makes two design choices: first, it uses a soft-prompting instead of hard-prompting to inject learnable parameters to reprogram VLMs for compositional learning. Second, to address the compositional challenge, it uses the soft-embedding layer to learn primitive concepts in different combinations. By combining both soft-embedding and soft-prompting, \textit{PromptCompVL} achieves state-of-the-art performance on the MIT-States dataset. Furthermore, our proposed model achieves consistent improvement compared to other CLIP-based methods which shows the effectiveness of the proposed prompting strategies for CZSL.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Text and Document Classification Technologies