CoPL: Contextual Prompt Learning for Vision-Language Understanding

Koustava Goswami; Srikrishna Karanam; Prateksha Udhayanan; K J Joseph; and Balaji Vasan Srinivasan

arXiv:2307.00910·cs.CV·December 13, 2023·1 cites

CoPL: Contextual Prompt Learning for Vision-Language Understanding

Koustava Goswami, Srikrishna Karanam, Prateksha Udhayanan, K J Joseph, and Balaji Vasan Srinivasan

PDF

Open Access

TL;DR

CoPL introduces a novel prompt learning framework that leverages local image features and dynamic weighting to improve vision-language understanding, especially in out-of-distribution and few-shot scenarios.

Contribution

The paper proposes Contextual Prompt Learning (CoPL), which aligns prompts with local image features and learns to reweight prompts based on image semantics, enhancing model generalization.

Findings

01

Significantly outperforms existing methods on standard datasets.

02

Improves few-shot learning performance.

03

Enhances out-of-distribution generalization.

Abstract

Recent advances in multimodal learning has resulted in powerful vision-language models, whose representations are generalizable across a variety of downstream tasks. Recently, their generalization ability has been further extended by incorporating trainable prompts, borrowed from the natural language processing literature. While such prompt learning techniques have shown impressive results, we identify that these prompts are trained based on global image features which limits itself in two aspects: First, by using global features, these prompts could be focusing less on the discriminative foreground image, resulting in poor generalization to various out-of-distribution test cases. Second, existing work weights all prompts equally whereas intuitively, prompts should be reweighed according to the semantics of the image. We address these as part of our proposed Contextual Prompt Learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques

MethodsAttentive Walk-Aggregating Graph Neural Network