PRE: Vision-Language Prompt Learning with Reparameterization Encoder

Thi Minh Anh Pham; An Duc Nguyen; Cephas Svosve; Vasileios Argyriou,; Georgios Tzimiropoulos

arXiv:2309.07760·cs.CV·September 17, 2024·1 cites

PRE: Vision-Language Prompt Learning with Reparameterization Encoder

Thi Minh Anh Pham, An Duc Nguyen, Cephas Svosve, Vasileios Argyriou,, Georgios Tzimiropoulos

PDF

Open Access 2 Repos

TL;DR

PRE introduces a reparameterization encoder for vision-language prompt learning, significantly improving generalization to unseen classes in zero-shot transfer tasks while maintaining efficiency and performance.

Contribution

It proposes a novel reparameterization encoder that enhances prompt generalization to unseen classes in vision-language models, addressing limitations of previous prompt learning methods.

Findings

01

Achieves 5.60% higher accuracy on new classes in 16-shot setting.

02

Improves harmonic mean by 3% over CoOp.

03

Demonstrates efficiency across 8 benchmarks.

Abstract

Large pre-trained vision-language models such as CLIP have demonstrated great potential in zero-shot transferability to downstream tasks. However, to attain optimal performance, the manual selection of prompts is necessary to improve alignment between the downstream image distribution and the textual class descriptions. This manual prompt engineering is the major challenge for deploying such models in practice since it requires domain expertise and is extremely time-consuming. To avoid non-trivial prompt engineering, recent work Context Optimization (CoOp) introduced the concept of prompt learning to the vision domain using learnable textual tokens. While CoOp can achieve substantial improvements over manual prompts, its learned context is worse generalizable to wider unseen classes within the same dataset. In this work, we present Prompt Learning with Reparameterization Encoder (PRE) -…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI

MethodsContext Optimization · Contrastive Language-Image Pre-training · Balanced Selection