Variational Mixture-of-Experts Autoencoders for Multi-Modal Deep   Generative Models

Yuge Shi; N. Siddharth; Brooks Paige; Philip H.S. Torr

arXiv:1911.03393·stat.ML·November 11, 2019·89 cites

Variational Mixture-of-Experts Autoencoders for Multi-Modal Deep Generative Models

Yuge Shi, N. Siddharth, Brooks Paige, Philip H.S. Torr

PDF

Open Access 3 Repos

TL;DR

This paper introduces a novel mixture-of-experts variational autoencoder designed for multi-modal data, effectively capturing shared and private features, enabling coherent generation across modalities, and improving individual modality learning.

Contribution

The paper proposes a new multimodal variational autoencoder that satisfies four key criteria for effective multi-modal generative modeling, including shared/private decomposition and cross-modal coherence.

Findings

01

Successfully models multiple data modalities including image and language

02

Achieves coherent joint and cross-generation across modalities

03

Improves individual modality learning through multi-modal integration

Abstract

Learning generative models that span multiple data modalities, such as vision and language, is often motivated by the desire to learn more useful, generalisable representations that faithfully capture common underlying factors between the modalities. In this work, we characterise successful learning of such models as the fulfillment of four criteria: i) implicit latent decomposition into shared and private subspaces, ii) coherent joint generation over all modalities, iii) coherent cross-generation across individual modalities, and iv) improved model learning for individual modalities through multi-modal integration. Here, we propose a mixture-of-experts multimodal variational autoencoder (MMVAE) to learn generative models on different sets of modalities, including a challenging image-language dataset, and demonstrate its ability to satisfy all four criteria, both qualitatively and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Cancer-related molecular mechanisms research · Natural Language Processing Techniques

MethodsSolana Customer Service Number +1-833-534-1729