Read-only Prompt Optimization for Vision-Language Few-shot Learning

Dongjun Lee; Seokwon Song; Jihee Suh; Joonmyung Choi; Sanghyeok Lee,; and Hyunwoo J.Kim

arXiv:2308.14960·cs.CV·November 13, 2023·1 cites

Read-only Prompt Optimization for Vision-Language Few-shot Learning

Dongjun Lee, Seokwon Song, Jihee Suh, Joonmyung Choi, Sanghyeok Lee,, and Hyunwoo J.Kim

PDF

Open Access 1 Repo

TL;DR

This paper introduces Read-only Prompt Optimization (RPO), a novel method for vision-language few-shot learning that enhances generalization and robustness by preventing internal representation shifts in pre-trained models.

Contribution

RPO leverages masked attention and special token initialization to improve prompt tuning, outperforming existing methods in various generalization and data-scarce scenarios.

Findings

01

RPO outperforms CLIP and CoCoOp in base-to-new generalization.

02

RPO demonstrates superior domain generalization and robustness.

03

RPO is more parameter-efficient and computationally less demanding.

Abstract

In recent years, prompt tuning has proven effective in adapting pre-trained vision-language models to downstream tasks. These methods aim to adapt the pre-trained models by introducing learnable prompts while keeping pre-trained weights frozen. However, learnable prompts can affect the internal representation within the self-attention module, which may negatively impact performance variance and generalization, especially in data-deficient settings. To address these issues, we propose a novel approach, Read-only Prompt Optimization (RPO). RPO leverages masked attention to prevent the internal representation shift in the pre-trained model. Further, to facilitate the optimization of RPO, the read-only prompts are initialized based on special tokens of the pre-trained model. Our extensive experiments demonstrate that RPO outperforms CLIP and CoCoOp in base-to-new generalization and domain…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mlvlab/rpo
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition

MethodsContrastive Language-Image Pre-training