Attend and Enrich: Enhanced Visual Prompt for Zero-Shot Learning
Man Liu, Huihui Bai, Feng Li, Chunjie Zhang, Yunchao Wei, Tat-Seng, Chua, Yao Zhao

TL;DR
This paper introduces AENet, a novel approach for zero-shot learning that enriches visual prompts with semantic information, improving transferability and outperforming existing methods on benchmark datasets.
Contribution
AENet innovatively integrates semantic information into visual prompts through concept-harmonized tokens and residual refinement, enhancing zero-shot learning performance.
Findings
Outperforms state-of-the-art ZSL methods on three benchmarks
Effectively incorporates semantic info into visual prompts
Enhances generalization to unseen categories
Abstract
Zero-shot learning (ZSL) endeavors to transfer knowledge from seen categories to recognize unseen categories, which mostly relies on the semantic-visual interactions between image and attribute tokens. Recently, prompt learning has emerged in ZSL and demonstrated significant potential as it allows the zero-shot transfer of diverse visual concepts to downstream tasks. However, current methods explore the fixed adaption of learnable prompt on seen domains, which makes them over-emphasize the primary visual features observed during training, limiting their generalization capabilities to unseen domains. In this work, we propose AENet, which endows semantic information into the visual prompt to distill semantic-enhanced prompt for visual representation enrichment, enabling effective knowledge transfer for ZSL. AENet comprises two key steps: 1) exploring the concept-harmonized tokens for the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsGeophysical Methods and Applications · Domain Adaptation and Few-Shot Learning
