Variational Mixture-of-Experts Autoencoders for Multi-Modal Deep Generative Models
Yuge Shi, N. Siddharth, Brooks Paige, Philip H.S. Torr

TL;DR
This paper introduces a novel mixture-of-experts variational autoencoder designed for multi-modal data, effectively capturing shared and private features, enabling coherent generation across modalities, and improving individual modality learning.
Contribution
The paper proposes a new multimodal variational autoencoder that satisfies four key criteria for effective multi-modal generative modeling, including shared/private decomposition and cross-modal coherence.
Findings
Successfully models multiple data modalities including image and language
Achieves coherent joint and cross-generation across modalities
Improves individual modality learning through multi-modal integration
Abstract
Learning generative models that span multiple data modalities, such as vision and language, is often motivated by the desire to learn more useful, generalisable representations that faithfully capture common underlying factors between the modalities. In this work, we characterise successful learning of such models as the fulfillment of four criteria: i) implicit latent decomposition into shared and private subspaces, ii) coherent joint generation over all modalities, iii) coherent cross-generation across individual modalities, and iv) improved model learning for individual modalities through multi-modal integration. Here, we propose a mixture-of-experts multimodal variational autoencoder (MMVAE) to learn generative models on different sets of modalities, including a challenging image-language dataset, and demonstrate its ability to satisfy all four criteria, both qualitatively and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Cancer-related molecular mechanisms research · Natural Language Processing Techniques
MethodsSolana Customer Service Number +1-833-534-1729
