Epsilon: Exploring Comprehensive Visual-Semantic Projection for   Multi-Label Zero-Shot Learning

Ziming Liu; Jingcai Guo; Song Guo; Xiaocheng Lu

arXiv:2408.12253·cs.CV·August 27, 2024

Epsilon: Exploring Comprehensive Visual-Semantic Projection for Multi-Label Zero-Shot Learning

Ziming Liu, Jingcai Guo, Song Guo, Xiaocheng Lu

PDF

Open Access 1 Video

TL;DR

This paper introduces Epsilon, a comprehensive visual-semantic framework for multi-label zero-shot learning that effectively integrates local and global features, leading to improved recognition of unseen classes in images.

Contribution

The paper proposes a novel framework that combines semantic prompt aggregation and global propagation to enhance multi-label zero-shot learning performance.

Findings

01

Epsilon outperforms state-of-the-art methods on NUS-Wide and Open-Images-v4 datasets.

02

Effective semantic aggregation improves recognition accuracy.

03

Global feature collection reduces bias and enhances robustness.

Abstract

This paper investigates a challenging problem of zero-shot learning in the multi-label scenario (MLZSL), wherein the model is trained to recognize multiple unseen classes within a sample (e.g., an image) based on seen classes and auxiliary knowledge, e.g., semantic information. Existing methods usually resort to analyzing the relationship of various seen classes residing in a sample from the dimension of spatial or semantic characteristics and transferring the learned model to unseen ones. However, they neglect the integrity of local and global features. Although the use of the attention structure will accurately locate local features, especially objects, it will significantly lose its integrity, and the relationship between classes will also be affected. Rough processing of global features will also directly affect comprehensiveness. This neglect will make the model lose its grasp of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Epsilon: Exploring Comprehensive Visual-Semantic Projection for Multi-Label Zero-Shot Learning· underline

Taxonomy

TopicsPharmacy and Medical Practices · Ideological and Political Education

MethodsSoftmax · Attention Is All You Need