Bridging the inference gap in Mutimodal Variational Autoencoders
Agathe Senellart, St\'ephanie Allassonni\`ere

TL;DR
This paper introduces an interpretable multimodal VAE model that improves generation quality by modeling joint and conditional distributions separately and leveraging shared information between modalities, achieving state-of-the-art results.
Contribution
The paper presents a novel model that avoids mixture aggregation, uses variational inference and Normalizing Flows, and enhances conditional coherence by exploiting shared modality information.
Findings
Achieves state-of-the-art results on benchmark datasets.
Outperforms mixture-of-experts models in complex data scenarios.
Improves conditional generation quality through shared information extraction.
Abstract
From medical diagnosis to autonomous vehicles, critical applications rely on the integration of multiple heterogeneous data modalities. Multimodal Variational Autoencoders offer versatile and scalable methods for generating unobserved modalities from observed ones. Recent models using mixturesof-experts aggregation suffer from theoretically grounded limitations that restrict their generation quality on complex datasets. In this article, we propose a novel interpretable model able to learn both joint and conditional distributions without introducing mixture aggregation. Our model follows a multistage training process: first modeling the joint distribution with variational inference and then modeling the conditional distributions with Normalizing Flows to better approximate true posteriors. Importantly, we also propose to extract and leverage the information shared between modalities to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsVariational Inference · Normalizing Flows
