Retaining and Enhancing Pre-trained Knowledge in Vision-Language Models   with Prompt Ensembling

Donggeun Kim; Yujin Jo; Myungjoo Lee; Taesup Kim

arXiv:2412.07077·cs.CV·December 11, 2024

Retaining and Enhancing Pre-trained Knowledge in Vision-Language Models with Prompt Ensembling

Donggeun Kim, Yujin Jo, Myungjoo Lee, Taesup Kim

PDF

Open Access

TL;DR

This paper introduces Group-wise Prompt Ensemble (GPE), a novel method to enhance CLIP's ability to incorporate new domain knowledge while maintaining zero-shot performance, improving adaptability and robustness across diverse datasets.

Contribution

The paper presents GPE, a prompt ensemble learning approach that effectively integrates new domain knowledge into CLIP without compromising its zero-shot capabilities.

Findings

01

GPE improves CLIP's adaptability to new domains.

02

GPE outperforms existing methods in cross-dataset transfer tasks.

03

GPE enhances robustness against data distribution shifts.

Abstract

The advancement of vision-language models, particularly the Contrastive Language-Image Pre-training (CLIP) model, has revolutionized the field of machine learning by enabling robust zero-shot learning capabilities. These capabilities allow models to understand and respond to previously unseen data without task-specific training. However, adapting CLIP to integrate specialized knowledge from various domains while retaining its zero-shot capabilities remains a significant challenge. To address this, we introduce a novel prompt ensemble learning approach called Group-wise Prompt Ensemble (GPE). This method aims to enhance CLIP's zero-shot capabilities by incorporating new domain knowledge while improving its adaptability and robustness against data distribution shifts. Our approach hinges on three main strategies: prompt grouping with masked attention to optimize CLIP's adaptability while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling

MethodsSoftmax · Attention Is All You Need · Contrastive Language-Image Pre-training