Learning to Compose Soft Prompts for Compositional Zero-Shot Learning

Nihal V. Nayak; Peilin Yu; Stephen H. Bach

arXiv:2204.03574·cs.LG·April 25, 2023·41 cites

Learning to Compose Soft Prompts for Compositional Zero-Shot Learning

Nihal V. Nayak, Peilin Yu, Stephen H. Bach

PDF

Open Access 1 Repo 1 Datasets 1 Video

TL;DR

This paper proposes compositional soft prompting (CSP), a parameter-efficient method that enhances zero-shot compositionality in vision-language models by learning attribute and object tokens, leading to significant improvements on benchmark datasets.

Contribution

CSP introduces learnable attribute-object tokens for better zero-shot compositionality in large-scale pretrained models, outperforming existing soft prompting methods and baseline models.

Findings

01

CSP outperforms CLIP by 10.9% on average AUC.

02

CSP surpasses CoOp by 5.8% on average AUC.

03

Improves generalization to higher-order attribute compositions.

Abstract

We introduce compositional soft prompting (CSP), a parameter-efficient learning technique to improve the zero-shot compositionality of large-scale pretrained vision-language models (VLMs) like CLIP. We develop CSP for compositional zero-shot learning, the task of predicting unseen attribute-object compositions (e.g., old cat and young tiger). VLMs have a flexible text encoder that can represent arbitrary classes as natural language prompts but they often underperform task-specific architectures on the compositional zero-shot benchmark datasets. CSP treats the attributes and objects that define classes as learnable tokens of vocabulary. During training, the vocabulary is tuned to recognize classes that compose tokens in multiple ways (e.g., old cat and white cat). At test time, we recompose the learned attribute-object vocabulary in new combinations to recognize novel classes. We show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

batsresearch/csp
pytorchOfficial

Datasets

nihalnayak/cgqa
dataset· 40 dl
40 dl

Videos

Learning to Compose Soft Prompts for Compositional Zero-Shot Learning· slideslive

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · COVID-19 diagnosis using AI

MethodsContext Optimization