A survey of multimodal deep generative models

Masahiro Suzuki; Yutaka Matsuo

arXiv:2207.02127·cs.LG·July 6, 2022

A survey of multimodal deep generative models

Masahiro Suzuki, Yutaka Matsuo

PDF

TL;DR

This survey reviews recent advances in multimodal deep generative models, focusing on variational autoencoders that handle heterogeneous data and enable cross-modal generation.

Contribution

It provides a comprehensive categorization and analysis of recent studies on multimodal deep generative models based on variational autoencoders.

Findings

01

Summarizes key approaches and architectures in multimodal deep generative modeling.

02

Highlights challenges and solutions in cross-modal generation and shared representation inference.

03

Identifies future directions for research in multimodal deep generative models.

Abstract

Multimodal learning is a framework for building models that make predictions based on different types of modalities. Important challenges in multimodal learning are the inference of shared representations from arbitrary modalities and cross-modal generation via these representations; however, achieving this requires taking the heterogeneous nature of multimodal data into account. In recent years, deep generative models, i.e., generative models in which distributions are parameterized by deep neural networks, have attracted much attention, especially variational autoencoders, which are suitable for accomplishing the above challenges because they can consider heterogeneity and infer good representations of data. Therefore, various multimodal generative models based on variational autoencoders, called multimodal deep generative models, have been proposed in recent years. In this paper, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.