PLPP: Prompt Learning with Perplexity Is Self-Distillation for   Vision-Language Models

Biao Liu; Wenyi Fang; Xiaoyu Wu; Yang Zheng; Zheng Hu; Bo Yuan

arXiv:2412.15277·cs.CL·December 23, 2024

PLPP: Prompt Learning with Perplexity Is Self-Distillation for Vision-Language Models

Biao Liu, Wenyi Fang, Xiaoyu Wu, Yang Zheng, Zheng Hu, Bo Yuan

PDF

Open Access

TL;DR

PLPP introduces a novel prompt regularization technique using perplexity loss, which enhances vision-language models like CLIP by reducing overfitting and improving downstream task performance through self-distillation and mutual learning strategies.

Contribution

The paper proposes PLPP, a prompt learning method that employs perplexity-based regularization and self-distillation to improve vision-language model fine-tuning.

Findings

01

PLPP outperforms existing prompt learning methods on multiple classification tasks.

02

Using perplexity loss reduces overfitting in prompt tuning.

03

Mutual self-distillation accelerates convergence and enhances performance.

Abstract

Pre-trained Vision-Language (VL) models such as CLIP have demonstrated their excellent performance across numerous downstream tasks. A recent method, Context Optimization (CoOp), further improves the performance of VL models on downstream tasks by introducing prompt learning. CoOp optimizes a set of learnable vectors, aka prompt, and freezes the whole CLIP model. However, relying solely on CLIP loss to fine-tune prompts can lead to models that are prone to overfitting on downstream task. To address this issue, we propose a plug-in prompt-regularization method called PLPP (Prompt Learning with PerPlexity), which use perplexity loss to regularize prompt learning. PLPP designs a two-step operation to compute the perplexity for prompts: (a) calculating cosine similarity between the weight of the embedding layer and prompts to get labels, (b) introducing a language model (LM) head that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques

MethodsSparse Evolutionary Training · Context Optimization · Contrastive Language-Image Pre-training