Mixture-of-experts VAEs can disregard variation in surjective multimodal data
Jannik Wolff, Tassilo Klein, Moin Nabi, Rahul G. Krishnan, Shinichi, Nakajima

TL;DR
This paper analyzes the limitations of mixture-of-experts VAEs in modeling surjective multimodal data, revealing they often fail to capture variability in such complex datasets.
Contribution
It provides both theoretical and empirical evidence that mixture-of-experts VAEs struggle with surjective multimodal data, highlighting a key challenge in multimodal generative modeling.
Findings
Mixture-of-experts VAEs can ignore variation in surjective data.
Theoretical analysis explains why VAEs fail in these scenarios.
Empirical results confirm the limitations in practical settings.
Abstract
Machine learning systems are often deployed in domains that entail data from multiple modalities, for example, phenotypic and genotypic characteristics describe patients in healthcare. Previous works have developed multimodal variational autoencoders (VAEs) that generate several modalities. We consider subjective data, where single datapoints from one modality (such as class labels) describe multiple datapoints from another modality (such as images). We theoretically and empirically demonstrate that multimodal VAEs with a mixture of experts posterior can struggle to capture variability in such surjective data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis
