Weighted Multi-Prompt Learning with Description-free Large Language Model Distillation
Sua Lee, Kyubum Shin, Jung Ho Park

TL;DR
This paper introduces DeMul, a novel description-free multi-prompt learning method that distills knowledge from large language models into continuous prompts, improving robustness and performance in vision-language tasks without relying on discrete descriptions.
Contribution
DeMul eliminates the need for description extraction by directly distilling LLM knowledge into continuous prompts, and demonstrates effective prompt weighting in a multi-prompt setting.
Findings
Achieves superior performance across 11 recognition datasets.
Effectively incorporates prompt weighting to reflect prompt importance.
Outperforms existing prompt learning methods in robustness and accuracy.
Abstract
Recent advances in pre-trained Vision Language Models (VLM) have shown promising potential for effectively adapting to downstream tasks through prompt learning, without the need for additional annotated paired datasets. To supplement the text information in VLM trained on correlations with vision data, new approaches leveraging Large Language Models (LLM) in prompts have been proposed, enhancing robustness to unseen and diverse data. Existing methods typically extract text-based responses (i.e., descriptions) from LLM to incorporate into prompts; however, this approach suffers from high variability and low reliability. In this work, we propose Description-free Multi-prompt Learning(DeMul), a novel method that eliminates the process of extracting descriptions and instead directly distills knowledge from LLM into prompts. By adopting a description-free approach, prompts can encapsulate…
Peer Reviews
Decision·ICLR 2025 Poster
1. The paper is well-crafted with clear expressions, making it easy to understand. 2. The description-free distillation approach for training learnable prompts is interesting, novel, and effective. It is also more cost-effective than description-needed methods, which require multiple queries to models like GPT for comprehensive descriptions. 3. The study on diverse semantics and the importance of multiple learnable prompts is well-motivated. 4. The extensive few-shot learning experiments and
1. Although the paper demonstrates superior few-shot performance, the notable aspect is the open-vocabulary ability of CLIP. Since 2023, literature has focused on improving this aspect. Additional experiments on this topic are suggested. 2. The authors should discuss whether the proposed distillation method works when ground-truth training data is missing. Solely text-side fine-tuning may lead to misalignment with the visual side. If it doesn't work, this limitation should be addressed. 3. The
1. The paper looks into an interesting direction of enriching prompts using cyclic distillation from LLM embedding spaces. It mitigates the explicit description queries for the class-names of a given dataset. 2. The description-free approach with multi-prompt weighting seems to be effective in few-shot image classification. 3. The paper is well organized and easy to follow.
1. It seems that the pretraining has to be done on a dataset which covers all the category names in the evaluation. This would significantly limit the generalizability of the method. Also is class-names a good attribute for mapping the LLM embeddings? 2. As mapping functions need to be pretrained, what is the computational cost incurred? Also, it seems that the comprehensive dataset required a lot of samples from a long list of class-names to cover all datasets. Such an approach is prone to nois
Originality: The description-free distillation is a novel concept that cleverly addresses a key limitation of existing LLM-enhanced prompt learning methods. The weighted multi-prompt learning strategy further enhances the originality of the approach. Quality: The paper presents compelling empirical results across 11 diverse datasets, with DeMul consistently outperforming existing zero-shot and few-shot methods, including state-of-the-art techniques like GalLoP. The comprehensive experimental se
Limited analysis of the mapping function: While the paper introduces the concept of a mapping function, the exploration of its properties remains superficial, lacking quantitative analysis to assess its effectiveness. Dependence on training data distribution: The paper acknowledges the potential sensitivity of DeMul to the training data distribution but lacks an in-depth investigation of this sensitivity. Shared learnable vectors: While improving memory efficiency, shared learnable vectors mig
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling
