Generalized Zero-Shot Learning using Multimodal Variational Auto-Encoder   with Semantic Concepts

Nihar Bendre; Kevin Desai; Peyman Najafirad

arXiv:2106.14082·cs.CV·June 29, 2021

Generalized Zero-Shot Learning using Multimodal Variational Auto-Encoder with Semantic Concepts

Nihar Bendre, Kevin Desai, Peyman Najafirad

PDF

TL;DR

This paper introduces a Multimodal Variational Auto-Encoder that learns a shared latent space from image features and semantic data, improving generalized zero-shot learning by leveraging local and global semantic knowledge.

Contribution

It proposes a novel M-VAE model that integrates multimodal data into a shared latent space with a multi-modal loss, enhancing zero-shot learning performance.

Findings

01

Outperforms state-of-the-art methods on four benchmark datasets.

02

Effectively correlates modalities to improve novel sample prediction.

03

Utilizes local and global semantic knowledge for better generalization.

Abstract

With the ever-increasing amount of data, the central challenge in multimodal learning involves limitations of labelled samples. For the task of classification, techniques such as meta-learning, zero-shot learning, and few-shot learning showcase the ability to learn information about novel classes based on prior knowledge. Recent techniques try to learn a cross-modal mapping between the semantic space and the image space. However, they tend to ignore the local and global semantic knowledge. To overcome this problem, we propose a Multimodal Variational Auto-Encoder (M-VAE) which can learn the shared latent space of image features and the semantic space. In our approach we concatenate multimodal data to a single embedding before passing it to the VAE for learning the latent space. We propose the use of a multi-modal loss during the reconstruction of the feature embedding through the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.