Joint Multimodal Learning with Deep Generative Models

Masahiro Suzuki; Kotaro Nakayama; Yutaka Matsuo

arXiv:1611.01891·stat.ML·November 8, 2016·125 cites

Joint Multimodal Learning with Deep Generative Models

Masahiro Suzuki, Kotaro Nakayama, Yutaka Matsuo

PDF

Open Access 2 Repos

TL;DR

This paper introduces JMVAE, a deep generative model that learns a joint representation for multiple modalities, enabling bi-directional generation and reconstruction of modalities such as images and text.

Contribution

The paper proposes a novel joint multimodal variational autoencoder (JMVAE) and an enhanced version JMVAE-kl, which effectively model multi-modal data and generate missing modalities.

Findings

01

JMVAE captures high-level joint representations of multiple modalities.

02

JMVAE outperforms conventional VAEs in generating and reconstructing modalities.

03

JMVAE can generate multiple modalities in both directions.

Abstract

We investigate deep generative models that can exchange multiple modalities bi-directionally, e.g., generating images from corresponding texts and vice versa. Recently, some studies handle multiple modalities on deep generative models, such as variational autoencoders (VAEs). However, these models typically assume that modalities are forced to have a conditioned relation, i.e., we can only generate modalities in one direction. To achieve our objective, we should extract a joint representation that captures high-level concepts among all modalities and through which we can exchange them bi-directionally. As described herein, we propose a joint multimodal variational autoencoder (JMVAE), in which all modalities are independently conditioned on joint representation. In other words, it models a joint distribution of modalities. Furthermore, to be able to generate missing modalities from the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Natural Language Processing Techniques

MethodsSolana Customer Service Number +1-833-534-1729