Epsilon: Exploring Comprehensive Visual-Semantic Projection for Multi-Label Zero-Shot Learning
Ziming Liu, Jingcai Guo, Song Guo, Xiaocheng Lu

TL;DR
This paper introduces Epsilon, a comprehensive visual-semantic framework for multi-label zero-shot learning that effectively integrates local and global features, leading to improved recognition of unseen classes in images.
Contribution
The paper proposes a novel framework that combines semantic prompt aggregation and global propagation to enhance multi-label zero-shot learning performance.
Findings
Epsilon outperforms state-of-the-art methods on NUS-Wide and Open-Images-v4 datasets.
Effective semantic aggregation improves recognition accuracy.
Global feature collection reduces bias and enhances robustness.
Abstract
This paper investigates a challenging problem of zero-shot learning in the multi-label scenario (MLZSL), wherein the model is trained to recognize multiple unseen classes within a sample (e.g., an image) based on seen classes and auxiliary knowledge, e.g., semantic information. Existing methods usually resort to analyzing the relationship of various seen classes residing in a sample from the dimension of spatial or semantic characteristics and transferring the learned model to unseen ones. However, they neglect the integrity of local and global features. Although the use of the attention structure will accurately locate local features, especially objects, it will significantly lose its integrity, and the relationship between classes will also be affected. Rough processing of global features will also directly affect comprehensiveness. This neglect will make the model lose its grasp of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsPharmacy and Medical Practices · Ideological and Political Education
MethodsSoftmax · Attention Is All You Need
