Style-Pro: Style-Guided Prompt Learning for Generalizable   Vision-Language Models

Niloufar Alipour Talemi; Hossein Kashiani; Fatemeh Afghah

arXiv:2411.16018·cs.CV·November 26, 2024

Style-Pro: Style-Guided Prompt Learning for Generalizable Vision-Language Models

Niloufar Alipour Talemi, Hossein Kashiani, Fatemeh Afghah

PDF

Open Access

TL;DR

Style-Pro introduces a style-guided prompt learning framework for vision-language models like CLIP, effectively reducing overfitting and enhancing generalization across unseen domains and classes by synthesizing diverse style shifts.

Contribution

It proposes a novel style-guided prompt learning method with style bases and consistency constraints to improve the adaptability of pre-trained VL models.

Findings

01

Outperforms state-of-the-art methods on 11 benchmark datasets.

02

Enhances zero-shot and domain generalization capabilities.

03

Effectively mitigates overfitting in prompt learning.

Abstract

Pre-trained Vision-language (VL) models, such as CLIP, have shown significant generalization ability to downstream tasks, even with minimal fine-tuning. While prompt learning has emerged as an effective strategy to adapt pre-trained VL models for downstream tasks, current approaches frequently encounter severe overfitting to specific downstream data distributions. This overfitting constrains the original behavior of the VL models to generalize to new domains or unseen classes, posing a critical challenge in enhancing the adaptability and generalization of VL models. To address this limitation, we propose Style-Pro, a novel style-guided prompt learning framework that mitigates overfitting and preserves the zero-shot generalization capabilities of CLIP. Style-Pro employs learnable style bases to synthesize diverse distribution shifts, guided by two specialized loss functions that ensure…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques

MethodsContrastive Language-Image Pre-training