FDBPL: Faster Distillation-Based Prompt Learning for Region-Aware Vision-Language Models Adaptation
Zherui Zhang, Jiaxin Wu, Changwei Wang, Rongtao Xu, Longzhao Huang, Wenhao Xu, Wenbo Xu, Li Guo, Shibiao Xu

TL;DR
This paper introduces FDBPL, a prompt learning method that accelerates training and enhances zero-shot generalization for vision-language models by sharing supervision and exploiting region-aware prompts.
Contribution
FDBPL proposes a novel region-aware prompt learning paradigm with mutual learning, achieving faster training and improved generalization without sacrificing parameter efficiency.
Findings
Achieves 2.2x faster training speed.
Improves zero-shot recognition performance.
Demonstrates superior results across 11 datasets.
Abstract
Prompt learning as a parameter-efficient method that has been widely adopted to adapt Vision-Language Models (VLMs) to downstream tasks. While hard-prompt design requires domain expertise and iterative optimization, soft-prompt methods rely heavily on task-specific hard labels, limiting their generalization to unseen categories. Recent popular distillation-based prompt learning methods improve generalization by exploiting larger teacher VLMs and unsupervised knowledge transfer, yet their repetitive teacher model online inference sacrifices the inherent training efficiency advantage of prompt learning. In this paper, we propose {{\large {\textbf{F}}}}aster {{\large {\textbf{D}}}}istillation-{{\large {\textbf{B}}}}ased {{\large {\textbf{P}}}}rompt {{\large {\textbf{L}}}}earning (\textbf{FDBPL}), which addresses these issues by sharing soft supervision contexts across multiple training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
