Modular Prompt Learning Improves Vision-Language Models
Zhenhan Huang, Tejaswini Pedapati, Pin-Yu Chen, Jianxi Gao

TL;DR
This paper introduces Modular Prompt Learning (MPL), a novel approach that enhances vision-language models by preserving information in deep prompts, leading to improved generalization and cross-dataset performance.
Contribution
The paper proposes MPL, a new prompt learning method that maintains prompt information across transformer layers, outperforming existing deep prompt techniques.
Findings
Achieves 0.7% average performance gain on base-to-new generalization across 11 datasets.
Largest improvement of 10.7% on EuroSAT dataset.
Effectively preserves prompt information, enhancing model performance.
Abstract
Pre-trained vision-language models are able to interpret visual concepts and language semantics. Prompt learning, a method of constructing prompts for text encoders or image encoders, elicits the potentials of pre-trained models and readily adapts them to new scenarios. Compared to fine-tuning, prompt learning enables the model to achieve comparable or better performance using fewer trainable parameters. Besides, prompt learning freezes the pre-trained model and avoids the catastrophic forgetting issue in the fine-tuning. Continuous prompts inserted into the input of every transformer layer (i.e. deep prompts) can improve the performances of pre-trained models on downstream tasks. For i-th transformer layer, the inserted prompts replace previously inserted prompts in the -th layer. Although the self-attention mechanism contextualizes newly inserted prompts for the current layer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
