Multimodal Generative Models for Compositional Representation Learning

Mike Wu; Noah Goodman

arXiv:1912.05075·cs.LG·December 12, 2019·6 cites

Multimodal Generative Models for Compositional Representation Learning

Mike Wu, Noah Goodman

PDF

Open Access

TL;DR

This paper introduces a family of multimodal deep generative models that effectively combine image and text data, improving representation learning and downstream task performance through novel variational objectives and model combinations.

Contribution

It presents a new variational bound-based framework for multimodal generative models, generalizes to various deep generative types, and demonstrates improved performance and interpretability across multiple datasets.

Findings

01

Multimodal VAEs outperform previous models with and without weak supervision.

02

Combining GANs with VAEs enhances image and text generation quality.

03

Language influences image representations, making them more abstract and compositional.

Abstract

As deep neural networks become more adept at traditional tasks, many of the most exciting new challenges concern multimodality---observations that combine diverse types, such as image and text. In this paper, we introduce a family of multimodal deep generative models derived from variational bounds on the evidence (data marginal likelihood). As part of our derivation we find that many previous multimodal variational autoencoders used objectives that do not correctly bound the joint marginal likelihood across modalities. We further generalize our objective to work with several types of deep generative model (VAE, GAN, and flow-based), and allow use of different model types for different modalities. We benchmark our models across many image, label, and text datasets, and find that our multimodal VAEs excel with and without weak supervision. Additional improvements come from use of GAN…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling

MethodsConvolution · USD Coin Customer Service Number +1-833-534-1729 · Dogecoin Customer Service Number +1-833-534-1729