Learning Aligned Cross-Modal Representation for Generalized Zero-Shot Classification
Zhiyu Fang, Xiaobin Zhu, Chun Yang, Zheng Han, Jingyan Qin, Xu-Cheng, Yin

TL;DR
This paper introduces ACMR, an autoencoder-based approach that aligns visual and semantic features for improved generalized zero-shot classification, addressing domain shift issues with novel alignment and enhancement modules.
Contribution
The paper proposes a new autoencoder network with Vision-Semantic Alignment and Information Enhancement Module for better cross-modal feature alignment in GZSC.
Findings
Achieves state-of-the-art results on benchmark datasets.
Effectively reduces domain shift in zero-shot learning.
Improves discriminative power of latent features.
Abstract
Learning a common latent embedding by aligning the latent spaces of cross-modal autoencoders is an effective strategy for Generalized Zero-Shot Classification (GZSC). However, due to the lack of fine-grained instance-wise annotations, it still easily suffer from the domain shift problem for the discrepancy between the visual representation of diversified images and the semantic representation of fixed attributes. In this paper, we propose an innovative autoencoder network by learning Aligned Cross-Modal Representations (dubbed ACMR) for GZSC. Specifically, we propose a novel Vision-Semantic Alignment (VSA) method to strengthen the alignment of cross-modal latent features on the latent subspaces guided by a learned classifier. In addition, we propose a novel Information Enhancement Module (IEM) to reduce the possibility of latent variables collapse meanwhile encouraging the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Cancer-related molecular mechanisms research · Multimodal Machine Learning Applications
