Attend and Enrich: Enhanced Visual Prompt for Zero-Shot Learning

Man Liu; Huihui Bai; Feng Li; Chunjie Zhang; Yunchao Wei; Tat-Seng; Chua; Yao Zhao

arXiv:2406.03032·cs.CV·March 11, 2025·1 cites

Attend and Enrich: Enhanced Visual Prompt for Zero-Shot Learning

Man Liu, Huihui Bai, Feng Li, Chunjie Zhang, Yunchao Wei, Tat-Seng, Chua, Yao Zhao

PDF

Open Access 1 Video

TL;DR

This paper introduces AENet, a novel approach for zero-shot learning that enriches visual prompts with semantic information, improving transferability and outperforming existing methods on benchmark datasets.

Contribution

AENet innovatively integrates semantic information into visual prompts through concept-harmonized tokens and residual refinement, enhancing zero-shot learning performance.

Findings

01

Outperforms state-of-the-art ZSL methods on three benchmarks

02

Effectively incorporates semantic info into visual prompts

03

Enhances generalization to unseen categories

Abstract

Zero-shot learning (ZSL) endeavors to transfer knowledge from seen categories to recognize unseen categories, which mostly relies on the semantic-visual interactions between image and attribute tokens. Recently, prompt learning has emerged in ZSL and demonstrated significant potential as it allows the zero-shot transfer of diverse visual concepts to downstream tasks. However, current methods explore the fixed adaption of learnable prompt on seen domains, which makes them over-emphasize the primary visual features observed during training, limiting their generalization capabilities to unseen domains. In this work, we propose AENet, which endows semantic information into the visual prompt to distill semantic-enhanced prompt for visual representation enrichment, enabling effective knowledge transfer for ZSL. AENet comprises two key steps: 1) exploring the concept-harmonized tokens for the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Attend and Enrich: Enhanced Visual Prompt for Zero-Shot Learning· underline

Taxonomy

TopicsGeophysical Methods and Applications · Domain Adaptation and Few-Shot Learning