PLPP: Prompt Learning with Perplexity Is Self-Distillation for Vision-Language Models
Biao Liu, Wenyi Fang, Xiaoyu Wu, Yang Zheng, Zheng Hu, Bo Yuan

TL;DR
PLPP introduces a novel prompt regularization technique using perplexity loss, which enhances vision-language models like CLIP by reducing overfitting and improving downstream task performance through self-distillation and mutual learning strategies.
Contribution
The paper proposes PLPP, a prompt learning method that employs perplexity-based regularization and self-distillation to improve vision-language model fine-tuning.
Findings
PLPP outperforms existing prompt learning methods on multiple classification tasks.
Using perplexity loss reduces overfitting in prompt tuning.
Mutual self-distillation accelerates convergence and enhances performance.
Abstract
Pre-trained Vision-Language (VL) models such as CLIP have demonstrated their excellent performance across numerous downstream tasks. A recent method, Context Optimization (CoOp), further improves the performance of VL models on downstream tasks by introducing prompt learning. CoOp optimizes a set of learnable vectors, aka prompt, and freezes the whole CLIP model. However, relying solely on CLIP loss to fine-tune prompts can lead to models that are prone to overfitting on downstream task. To address this issue, we propose a plug-in prompt-regularization method called PLPP (Prompt Learning with PerPlexity), which use perplexity loss to regularize prompt learning. PLPP designs a two-step operation to compute the perplexity for prompts: (a) calculating cosine similarity between the weight of the embedding layer and prompts to get labels, (b) introducing a language model (LM) head that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques
MethodsSparse Evolutionary Training · Context Optimization · Contrastive Language-Image Pre-training
