GIPCOL: Graph-Injected Soft Prompting for Compositional Zero-Shot Learning
Guangyue Xu, Joyce Chai, Parisa Kordjamshidi

TL;DR
GIPCOL introduces a graph-structured soft prompt learning method that enhances compositional zero-shot learning in vision-language models, achieving state-of-the-art results on multiple benchmarks.
Contribution
The paper proposes a novel graph-injected soft prompting approach that explicitly encodes compositional structure for improved CZSL performance.
Findings
GIPCOL outperforms previous methods on MIT-States, UT-Zappos, and C-GQA datasets.
The structured soft prompt effectively captures compositional relationships.
Analysis reveals when and why GIPCOL operates well with CLIP backbones.
Abstract
Pre-trained vision-language models (VLMs) have achieved promising success in many fields, especially with prompt learning paradigm. In this work, we propose GIP-COL (Graph-Injected Soft Prompting for COmpositional Learning) to better explore the compositional zero-shot learning (CZSL) ability of VLMs within the prompt-based learning framework. The soft prompt in GIPCOL is structured and consists of the prefix learnable vectors, attribute label and object label. In addition, the attribute and object labels in the soft prompt are designated as nodes in a compositional graph. The compositional graph is constructed based on the compositional structure of the objects and attributes extracted from the training data and consequently feeds the updated concept representation into the soft prompt to capture this compositional structure for a better prompting for CZSL. With the new prompting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
GIPCOL: Graph-Injected Soft Prompting for Compositional Zero-Shot Learning· youtube
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
MethodsContrastive Language-Image Pre-training
