Multimodal Variational Autoencoders for Semi-Supervised Learning: In   Defense of Product-of-Experts

Svetlana Kutuzova; Oswin Krause; Douglas McCloskey; Mads Nielsen,; Christian Igel

arXiv:2101.07240·cs.LG·August 2, 2021·6 cites

Multimodal Variational Autoencoders for Semi-Supervised Learning: In Defense of Product-of-Experts

Svetlana Kutuzova, Oswin Krause, Douglas McCloskey, Mads Nielsen,, Christian Igel

PDF

Open Access 1 Repo

TL;DR

This paper introduces a product-of-experts variational autoencoder for multimodal semi-supervised learning, demonstrating its advantages over other methods in generating and sampling multiple modalities coherently.

Contribution

It proposes a novel PoE-based VAE that effectively handles semi-supervised multimodal learning and outperforms existing mixture-of-experts and encoder-based approaches.

Findings

01

PoE models outperform MoE and encoder-based models in benchmarks.

02

PoE models better support joint generation of multiple modalities.

03

Empirical results validate PoE's suitability for conjunctive modality combination.

Abstract

Multimodal generative models should be able to learn a meaningful latent representation that enables a coherent joint generation of all modalities (e.g., images and text). Many applications also require the ability to accurately sample modalities conditioned on observations of a subset of the modalities. Often not all modalities may be observed for all training data points, so semi-supervised learning should be possible. In this study, we propose a novel product-of-experts (PoE) based variational autoencoder that have these desired properties. We benchmark it against a mixture-of-experts (MoE) approach and an approach of combining the modalities with an additional encoder network. An empirical evaluation shows that the PoE based models can outperform the contrasted models. Our experiments support the intuition that PoE models are more suited for a conjunctive combination of modalities.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sgalkina/poe-vaes
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Music and Audio Processing · Topic Modeling