Multimodal Variational Autoencoder: a Barycentric View

Peijie Qiu; Wenhui Zhu; Sayantan Kumar; Xiwen Chen; Xiaotong Sun; Jin; Yang; Abolfazl Razi; Yalin Wang; Aristeidis Sotiras

arXiv:2412.20487·cs.LG·December 31, 2024

Multimodal Variational Autoencoder: a Barycentric View

Peijie Qiu, Wenhui Zhu, Sayantan Kumar, Xiwen Chen, Xiaotong Sun, Jin, Yang, Abolfazl Razi, Yalin Wang, Aristeidis Sotiras

PDF

Open Access 1 Video

TL;DR

This paper introduces a barycentric framework for multimodal variational autoencoders, leveraging different divergence measures like Wasserstein distance to improve the learning of shared and modality-specific representations across multiple data modalities.

Contribution

It provides a novel theoretical formulation of multimodal VAEs using barycenters, extending existing methods with flexible divergence choices, notably the Wasserstein barycenter.

Findings

01

Wasserstein barycenter better preserves distribution geometry.

02

The proposed method outperforms traditional PoE and MoE approaches.

03

Empirical results on three benchmarks validate effectiveness.

Abstract

Multiple signal modalities, such as vision and sounds, are naturally present in real-world phenomena. Recently, there has been growing interest in learning generative models, in particular variational autoencoder (VAE), to for multimodal representation learning especially in the case of missing modalities. The primary goal of these models is to learn a modality-invariant and modality-specific representation that characterizes information across multiple modalities. Previous attempts at multimodal VAEs approach this mainly through the lens of experts, aggregating unimodal inference distributions with a product of experts (PoE), a mixture of experts (MoE), or a combination of both. In this paper, we provide an alternative generic and theoretical formulation of multimodal VAE through the lens of barycenter. We first show that PoE and MoE are specific instances of barycenters, derived by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Multimodal Variational Autoencoder: A Barycentric View· underline

Taxonomy

TopicsNeural Networks and Applications

MethodsMixture of Experts