MoPD: Mixture-of-Prompts Distillation for Vision-Language Models

Yang Chen; Shuai Fu; Yu Zhang

arXiv:2412.19087·cs.CV·September 15, 2025

MoPD: Mixture-of-Prompts Distillation for Vision-Language Models

Yang Chen, Shuai Fu, Yu Zhang

PDF

Open Access

TL;DR

MoPD is a novel soft prompt learning approach that distills knowledge from hard prompts to improve vision-language models' generalization to unseen classes, addressing overfitting issues.

Contribution

Introducing MoPD, a mixture-of-prompts distillation method that transfers knowledge from handcrafted hard prompts to soft prompts, enhancing unseen class performance.

Findings

01

Outperforms state-of-the-art baselines on unseen classes

02

Effectively transfers knowledge from hard to soft prompts

03

Improves generalization in vision-language models

Abstract

Soft prompt learning methods are effective for adapting vision-language models (VLMs) to downstream tasks. Nevertheless, empirical evidence reveals a tendency of existing methods that they overfit seen classes and exhibit degraded performance on unseen classes. This limitation is due to the inherent bias in the training data towards the seen classes. To address this issue, we propose a novel soft prompt learning method, named Mixture-of-Prompts Distillation (MoPD), which can effectively transfer useful knowledge from hard prompts manually hand-crafted (a.k.a. teacher prompts) to the learnable soft prompt (a.k.a. student prompt), thereby enhancing the generalization ability of soft prompts on unseen classes. Moreover, the proposed MoPD method utilizes a gating network that learns to select hard prompts used for prompt distillation. Extensive experiments demonstrate that the proposed MoPD…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques