Learning Aligned Cross-Modal Representation for Generalized Zero-Shot   Classification

Zhiyu Fang; Xiaobin Zhu; Chun Yang; Zheng Han; Jingyan Qin; Xu-Cheng; Yin

arXiv:2112.12927·cs.CV·December 28, 2021

Learning Aligned Cross-Modal Representation for Generalized Zero-Shot Classification

Zhiyu Fang, Xiaobin Zhu, Chun Yang, Zheng Han, Jingyan Qin, Xu-Cheng, Yin

PDF

Open Access 1 Video

TL;DR

This paper introduces ACMR, an autoencoder-based approach that aligns visual and semantic features for improved generalized zero-shot classification, addressing domain shift issues with novel alignment and enhancement modules.

Contribution

The paper proposes a new autoencoder network with Vision-Semantic Alignment and Information Enhancement Module for better cross-modal feature alignment in GZSC.

Findings

01

Achieves state-of-the-art results on benchmark datasets.

02

Effectively reduces domain shift in zero-shot learning.

03

Improves discriminative power of latent features.

Abstract

Learning a common latent embedding by aligning the latent spaces of cross-modal autoencoders is an effective strategy for Generalized Zero-Shot Classification (GZSC). However, due to the lack of fine-grained instance-wise annotations, it still easily suffer from the domain shift problem for the discrepancy between the visual representation of diversified images and the semantic representation of fixed attributes. In this paper, we propose an innovative autoencoder network by learning Aligned Cross-Modal Representations (dubbed ACMR) for GZSC. Specifically, we propose a novel Vision-Semantic Alignment (VSA) method to strengthen the alignment of cross-modal latent features on the latent subspaces guided by a learned classifier. In addition, we propose a novel Information Enhancement Module (IEM) to reduce the possibility of latent variables collapse meanwhile encouraging the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Learning Aligned Cross-Modal Representation for Generalized Zero-Shot Classification· underline

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Cancer-related molecular mechanisms research · Multimodal Machine Learning Applications