Visual Adaptive Prompting for Compositional Zero-Shot Learning
Kyle Stein, Arash Mahyari, Guillermo Francia, Eman El-Sheikh

TL;DR
This paper introduces a Visual Adaptive Prompting System (VAPS) that dynamically retrieves visual prompts based on image features, significantly improving compositional zero-shot learning performance in vision-language models.
Contribution
The paper proposes a novel visual prompt retrieval mechanism and a visual prompt adapter, enhancing the generalization of VLMs for CZSL tasks through adaptive visual prompts.
Findings
Achieves state-of-the-art results on three CZSL benchmarks.
Effective in both closed and open-world scenarios.
Demonstrates improved generalization over static prompting methods.
Abstract
Vision-Language Models (VLMs) have demonstrated impressive multimodal capabilities in learning joint representations of visual and textual data, making them powerful tools for tasks such as Compositional Zero-Shot Learning (CZSL). CZSL requires models to generalize to novel combinations of visual primitives--such as attributes and objects--that were not explicitly encountered during training. Recent works in prompting for CZSL have focused on modifying inputs for the text encoder, often using static prompts that do not change across varying visual contexts. However, these approaches struggle to fully capture varying visual contexts, as they focus on text adaptation rather than leveraging visual features for compositional reasoning. To address this, we propose a Visual Adaptive Prompting System (VAPS) that leverages a learnable visual prompt repository and similarity-based retrieval…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Neural Network Applications
MethodsFocus · Adapter
