Differentiable Prompt Learning for Vision Language Models

Zhenhan Huang; Tejaswini Pedapati; Pin-Yu Chen; Jianxi Gao

arXiv:2501.00457·cs.LG·January 3, 2025

Differentiable Prompt Learning for Vision Language Models

Zhenhan Huang, Tejaswini Pedapati, Pin-Yu Chen, Jianxi Gao

PDF

Open Access

TL;DR

This paper introduces Differentiable Prompt Learning (DPL), an automated method to optimize prompt configurations in vision-language models, significantly improving downstream task performance over manual prompt strategies.

Contribution

The paper proposes DPL, a novel optimization-based approach for automatic prompt configuration, enhancing deep continuous prompt design in pre-trained models.

Findings

01

DPL achieves a 2.60% average accuracy boost on 11 datasets.

02

DPL effectively finds high-confidence prompt configurations with limited data.

03

The method is compatible with existing sophisticated prompt designs.

Abstract

Prompt learning is an effective way to exploit the potential of large-scale pre-trained foundational models. Continuous prompts parameterize context tokens in prompts by turning them into differentiable vectors. Deep continuous prompts insert prompts not only in the input but also in the intermediate hidden representations. Manually designed deep continuous prompts exhibit a remarkable improvement compared to the zero-shot pre-trained model on downstream tasks. How to automate the continuous prompt design is an underexplored area, and a fundamental question arises, is manually designed deep prompt strategy optimal? To answer this question, we propose a method dubbed differentiable prompt learning (DPL). The DPL method is formulated as an optimization problem to automatically determine the optimal context length of the prompt to be added to each layer, where the objective is to maximize…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications

MethodsContrastive Language-Image Pre-training