GIPCOL: Graph-Injected Soft Prompting for Compositional Zero-Shot   Learning

Guangyue Xu; Joyce Chai; Parisa Kordjamshidi

arXiv:2311.05729·cs.CV·November 13, 2023·1 cites

GIPCOL: Graph-Injected Soft Prompting for Compositional Zero-Shot Learning

Guangyue Xu, Joyce Chai, Parisa Kordjamshidi

PDF

Open Access 1 Repo 1 Video

TL;DR

GIPCOL introduces a graph-structured soft prompt learning method that enhances compositional zero-shot learning in vision-language models, achieving state-of-the-art results on multiple benchmarks.

Contribution

The paper proposes a novel graph-injected soft prompting approach that explicitly encodes compositional structure for improved CZSL performance.

Findings

01

GIPCOL outperforms previous methods on MIT-States, UT-Zappos, and C-GQA datasets.

02

The structured soft prompt effectively captures compositional relationships.

03

Analysis reveals when and why GIPCOL operates well with CLIP backbones.

Abstract

Pre-trained vision-language models (VLMs) have achieved promising success in many fields, especially with prompt learning paradigm. In this work, we propose GIP-COL (Graph-Injected Soft Prompting for COmpositional Learning) to better explore the compositional zero-shot learning (CZSL) ability of VLMs within the prompt-based learning framework. The soft prompt in GIPCOL is structured and consists of the prefix learnable vectors, attribute label and object label. In addition, the attribute and object labels in the soft prompt are designated as nodes in a compositional graph. The compositional graph is constructed based on the compositional structure of the objects and attributes extracted from the training data and consequently feeds the updated concept representation into the soft prompt to capture this compositional structure for a better prompting for CZSL. With the new prompting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hlr/gipcol
pytorchOfficial

Videos

GIPCOL: Graph-Injected Soft Prompting for Compositional Zero-Shot Learning· youtube

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications

MethodsContrastive Language-Image Pre-training