Revisiting Prompt Pretraining of Vision-Language Models
Zhenyuan Chen, Lingfeng Yang, Shuo Chen, Zhaowei Chen, Jiajun Liang,, Xiang Li

TL;DR
This paper introduces Revisiting Prompt Pretraining (RPP), a framework that enhances prompt learning for vision-language models by improving fitting capacity and generalization through unshared prompt structures and soft label supervision, achieving state-of-the-art results.
Contribution
The paper proposes a novel RPP framework that improves prompt pretraining by unsharing prompt components and leveraging soft labels from a CLIP teacher, enhancing transferability and performance.
Findings
RPP achieves SOTA performance across various benchmarks.
Unshared prompt structures increase model fitting capacity.
Soft label supervision improves generalization.
Abstract
Prompt learning is an effective method to customize Vision-Language Models (VLMs) for various downstream tasks, involving tuning very few parameters of input prompt tokens. Recently, prompt pretraining in large-scale dataset (e.g., ImageNet-21K) has played a crucial role in prompt learning for universal visual discrimination. However, we revisit and observe that the limited learnable prompts could face underfitting risks given the extensive images during prompt pretraining, simultaneously leading to poor generalization. To address the above issues, in this paper, we propose a general framework termed Revisiting Prompt Pretraining (RPP), which targets at improving the fitting and generalization ability from two aspects: prompt structure and prompt supervision. For prompt structure, we break the restriction in common practice where query, key, and value vectors are derived from the shared…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Natural Language Processing Techniques
