Decoupling Augmentation Bias in Prompt Learning for Vision-Language Models

Gahyeon Kim; Sohee Kim; Seokju Lee

arXiv:2511.03367·cs.CV·November 6, 2025

Decoupling Augmentation Bias in Prompt Learning for Vision-Language Models

Gahyeon Kim, Sohee Kim, Seokju Lee

PDF

Open Access

TL;DR

This paper introduces AAPL, a novel method that uses adversarial token embeddings to decouple augmentation bias from semantic features in prompt learning, significantly improving generalization in vision-language models.

Contribution

AAPL is the first approach to explicitly separate augmentation-induced superficial variations from semantic features in prompt learning for vision-language models.

Findings

01

AAPL outperforms existing methods across multiple benchmarks.

02

Image-based augmentation enhances prompt learning effectiveness.

03

Decoupling improves generalization in few-shot and zero-shot tasks.

Abstract

Recent advances in large-scale vision and language models have led to significant progress in zero-shot learning tasks. Methods such as CoOp and CoCoOp have shown that replacing handcrafted prompts with learnable vectors, known as prompt learning, can result in improved performance. However, these models often struggle to generalize to entirely unseen categories. While traditional zero-shot learning techniques benefit from various data augmentation strategies, prompt learning has primarily focused on text-based modifications, leaving the potential of image-based augmentation largely unexplored. In this work, we explore how image-level augmentations, particularly those that introduce attribute-specific variations, can support and enhance prompt learning. Our analysis examines the interaction between these augmentations and soft prompt frameworks, revealing their potential to improve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Face recognition and analysis