Read-only Prompt Optimization for Vision-Language Few-shot Learning
Dongjun Lee, Seokwon Song, Jihee Suh, Joonmyung Choi, Sanghyeok Lee,, and Hyunwoo J.Kim

TL;DR
This paper introduces Read-only Prompt Optimization (RPO), a novel method for vision-language few-shot learning that enhances generalization and robustness by preventing internal representation shifts in pre-trained models.
Contribution
RPO leverages masked attention and special token initialization to improve prompt tuning, outperforming existing methods in various generalization and data-scarce scenarios.
Findings
RPO outperforms CLIP and CoCoOp in base-to-new generalization.
RPO demonstrates superior domain generalization and robustness.
RPO is more parameter-efficient and computationally less demanding.
Abstract
In recent years, prompt tuning has proven effective in adapting pre-trained vision-language models to downstream tasks. These methods aim to adapt the pre-trained models by introducing learnable prompts while keeping pre-trained weights frozen. However, learnable prompts can affect the internal representation within the self-attention module, which may negatively impact performance variance and generalization, especially in data-deficient settings. To address these issues, we propose a novel approach, Read-only Prompt Optimization (RPO). RPO leverages masked attention to prevent the internal representation shift in the pre-trained model. Further, to facilitate the optimization of RPO, the read-only prompts are initialized based on special tokens of the pre-trained model. Our extensive experiments demonstrate that RPO outperforms CLIP and CoCoOp in base-to-new generalization and domain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition
MethodsContrastive Language-Image Pre-training
