Generalized Multimodal ELBO

Thomas M. Sutter; Imant Daunhawer; Julia E. Vogt

arXiv:2105.02470·cs.LG·June 28, 2021

Generalized Multimodal ELBO

Thomas M. Sutter, Imant Daunhawer, Julia E. Vogt

PDF

1 Repo 1 Video

TL;DR

This paper introduces a generalized ELBO for multimodal data that overcomes limitations of existing models, enabling better joint data distribution learning and semantic coherence in self-supervised generative tasks.

Contribution

A new generalized ELBO formulation that unifies and improves upon previous methods for multimodal data modeling in self-supervised learning.

Findings

01

Outperforms state-of-the-art models in generative tasks

02

Encompasses previous methods as special cases

03

Enhances semantic coherence and joint distribution learning

Abstract

Multiple data types naturally co-occur when describing real-world phenomena and learning from them is a long-standing goal in machine learning research. However, existing self-supervised generative models approximating an ELBO are not able to fulfill all desired requirements of multimodal models: their posterior approximation functions lead to a trade-off between the semantic coherence and the ability to learn the joint data distribution. We propose a new, generalized ELBO formulation for multimodal data that overcomes these limitations. The new objective encompasses two previous methods as special cases and combines their benefits without compromises. In extensive experiments, we demonstrate the advantage of the proposed method compared to state-of-the-art models in self-supervised, generative learning tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thomassutter/MoPoE
pytorchOfficial

Videos

Generalized Multimodal ELBO· slideslive