Decoupling Augmentation Bias in Prompt Learning for Vision-Language Models
Gahyeon Kim, Sohee Kim, Seokju Lee

TL;DR
This paper introduces AAPL, a novel method that uses adversarial token embeddings to decouple augmentation bias from semantic features in prompt learning, significantly improving generalization in vision-language models.
Contribution
AAPL is the first approach to explicitly separate augmentation-induced superficial variations from semantic features in prompt learning for vision-language models.
Findings
AAPL outperforms existing methods across multiple benchmarks.
Image-based augmentation enhances prompt learning effectiveness.
Decoupling improves generalization in few-shot and zero-shot tasks.
Abstract
Recent advances in large-scale vision and language models have led to significant progress in zero-shot learning tasks. Methods such as CoOp and CoCoOp have shown that replacing handcrafted prompts with learnable vectors, known as prompt learning, can result in improved performance. However, these models often struggle to generalize to entirely unseen categories. While traditional zero-shot learning techniques benefit from various data augmentation strategies, prompt learning has primarily focused on text-based modifications, leaving the potential of image-based augmentation largely unexplored. In this work, we explore how image-level augmentations, particularly those that introduce attribute-specific variations, can support and enhance prompt learning. Our analysis examines the interaction between these augmentations and soft prompt frameworks, revealing their potential to improve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Face recognition and analysis
