Retaining and Enhancing Pre-trained Knowledge in Vision-Language Models with Prompt Ensembling
Donggeun Kim, Yujin Jo, Myungjoo Lee, Taesup Kim

TL;DR
This paper introduces Group-wise Prompt Ensemble (GPE), a novel method to enhance CLIP's ability to incorporate new domain knowledge while maintaining zero-shot performance, improving adaptability and robustness across diverse datasets.
Contribution
The paper presents GPE, a prompt ensemble learning approach that effectively integrates new domain knowledge into CLIP without compromising its zero-shot capabilities.
Findings
GPE improves CLIP's adaptability to new domains.
GPE outperforms existing methods in cross-dataset transfer tasks.
GPE enhances robustness against data distribution shifts.
Abstract
The advancement of vision-language models, particularly the Contrastive Language-Image Pre-training (CLIP) model, has revolutionized the field of machine learning by enabling robust zero-shot learning capabilities. These capabilities allow models to understand and respond to previously unseen data without task-specific training. However, adapting CLIP to integrate specialized knowledge from various domains while retaining its zero-shot capabilities remains a significant challenge. To address this, we introduce a novel prompt ensemble learning approach called Group-wise Prompt Ensemble (GPE). This method aims to enhance CLIP's zero-shot capabilities by incorporating new domain knowledge while improving its adaptability and robustness against data distribution shifts. Our approach hinges on three main strategies: prompt grouping with masked attention to optimize CLIP's adaptability while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling
MethodsSoftmax · Attention Is All You Need · Contrastive Language-Image Pre-training
