Modular Prompt Learning Improves Vision-Language Models

Zhenhan Huang; Tejaswini Pedapati; Pin-Yu Chen; Jianxi Gao

arXiv:2502.14125·cs.CV·February 21, 2025

Modular Prompt Learning Improves Vision-Language Models

Zhenhan Huang, Tejaswini Pedapati, Pin-Yu Chen, Jianxi Gao

PDF

Open Access 1 Repo

TL;DR

This paper introduces Modular Prompt Learning (MPL), a novel approach that enhances vision-language models by preserving information in deep prompts, leading to improved generalization and cross-dataset performance.

Contribution

The paper proposes MPL, a new prompt learning method that maintains prompt information across transformer layers, outperforming existing deep prompt techniques.

Findings

01

Achieves 0.7% average performance gain on base-to-new generalization across 11 datasets.

02

Largest improvement of 10.7% on EuroSAT dataset.

03

Effectively preserves prompt information, enhancing model performance.

Abstract

Pre-trained vision-language models are able to interpret visual concepts and language semantics. Prompt learning, a method of constructing prompts for text encoders or image encoders, elicits the potentials of pre-trained models and readily adapts them to new scenarios. Compared to fine-tuning, prompt learning enables the model to achieve comparable or better performance using fewer trainable parameters. Besides, prompt learning freezes the pre-trained model and avoids the catastrophic forgetting issue in the fine-tuning. Continuous prompts inserted into the input of every transformer layer (i.e. deep prompts) can improve the performances of pre-trained models on downstream tasks. For i-th transformer layer, the inserted prompts replace previously inserted prompts in the $(i - 1)$ -th layer. Although the self-attention mechanism contextualizes newly inserted prompts for the current layer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Zhenhan-Huang/Modular-Prompt-Learning
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques