Evolving Prompt Adaptation for Vision-Language Models
Enming Zhang, Jiayang Li, Yanru Wu, Zhenyu Liu, Yang Li

TL;DR
EvoPrompt introduces an evolutionary prompt adaptation framework for vision-language models that enhances few-shot learning performance while preserving zero-shot capabilities by controlling prompt evolution and preventing knowledge loss.
Contribution
The paper presents EvoPrompt, a novel method that explicitly guides prompt evolution using a hierarchical projector and regularization to avoid catastrophic forgetting in VLMs.
Findings
Achieves state-of-the-art few-shot learning performance.
Effectively preserves zero-shot capabilities of pre-trained models.
Demonstrates robustness across various downstream tasks.
Abstract
The adaptation of large-scale vision-language models (VLMs) to downstream tasks with limited labeled data remains a significant challenge. While parameter-efficient prompt learning methods offer a promising path, they often suffer from catastrophic forgetting of pre-trained knowledge. Toward addressing this limitation, our work is grounded in the insight that governing the evolutionary path of prompts is essential for forgetting-free adaptation. To this end, we propose EvoPrompt, a novel framework designed to explicitly steer the prompt trajectory for stable, knowledge-preserving fine-tuning. Specifically, our approach employs a Modality-Shared Prompt Projector (MPP) to generate hierarchical prompts from a unified embedding space. Critically, an evolutionary training strategy decouples low-rank updates into directional and magnitude components, preserving early-learned semantic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis
