AAPL: Adding Attributes to Prompt Learning for Vision-Language Models

Gahyeon Kim; Sohee Kim; Seokju Lee

arXiv:2404.16804·cs.CV·April 26, 2024·1 cites

AAPL: Adding Attributes to Prompt Learning for Vision-Language Models

Gahyeon Kim, Sohee Kim, Seokju Lee

PDF

Open Access 1 Repo

TL;DR

This paper introduces AAPL, a novel prompt learning method that adds attributes to improve vision-language models' ability to generalize to unseen classes, especially in zero-shot and few-shot tasks.

Contribution

The paper proposes adversarial token embedding and a new attribute-adding mechanism to enhance prompt learning for better unseen class generalization in vision-language models.

Findings

01

AAPL outperforms existing methods in zero-shot and few-shot learning.

02

AAPL demonstrates strong cross-dataset and domain generalization performance.

03

The method effectively disentangles visual bias from class features.

Abstract

Recent advances in large pre-trained vision-language models have demonstrated remarkable performance on zero-shot downstream tasks. Building upon this, recent studies, such as CoOp and CoCoOp, have proposed the use of prompt learning, where context within a prompt is replaced with learnable vectors, leading to significant improvements over manually crafted prompts. However, the performance improvement for unseen classes is still marginal, and to tackle this problem, data augmentation has been frequently used in traditional zero-shot learning techniques. Through our experiments, we have identified important issues in CoOp and CoCoOp: the context learned through traditional image augmentation is biased toward seen classes, negatively impacting generalization to unseen classes. To address this problem, we propose adversarial token embedding to disentangle low-level visual augmentation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Gahyeonkim09/AAPL
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Semantic Web and Ontologies

MethodsContext Optimization