Multimodal Generative Models for Scalable Weakly-Supervised Learning

Mike Wu; Noah Goodman

arXiv:1802.05335·cs.LG·November 13, 2018·150 cites

Multimodal Generative Models for Scalable Weakly-Supervised Learning

Mike Wu, Noah Goodman

PDF

Open Access 4 Repos

TL;DR

This paper introduces a multimodal variational autoencoder that efficiently learns joint representations from multiple modalities, even with missing data, and demonstrates its effectiveness on various tasks including image transformations and translation.

Contribution

The paper presents a novel MVAE with a product-of-experts inference network and shared parameters, enabling scalable weakly-supervised learning with incomplete data.

Findings

01

Achieves state-of-the-art performance with fewer parameters

02

Robust to incomplete supervision and missing modalities

03

Effective across diverse tasks like image processing and translation

Abstract

Multiple modalities often co-occur when describing natural phenomena. Learning a joint representation of these modalities should yield deeper and more useful representations. Previous generative approaches to multi-modal input either do not learn a joint distribution or require additional computation to handle missing data. Here, we introduce a multimodal variational autoencoder (MVAE) that uses a product-of-experts inference network and a sub-sampled training paradigm to solve the multi-modal inference problem. Notably, our model shares parameters to efficiently learn under any combination of missing modalities. We apply the MVAE on four datasets and match state-of-the-art performance using many fewer parameters. In addition, we show that the MVAE is directly applicable to weakly-supervised learning, and is robust to incomplete supervision. We then consider two case studies, one of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques