Learning multi-modal generative models with permutation-invariant encoders and tighter variational objectives
Marcel Hirt, Domenico Campolo, Victoria Leong, Juan-Pablo Ortega

TL;DR
This paper introduces a new approach for multi-modal generative models using permutation-invariant encoders and a tighter variational objective, improving the approximation of joint distributions across multiple data modalities.
Contribution
It proposes flexible aggregation schemes with permutation-invariant neural networks and a variational objective that better approximates the data log-likelihood in multi-modal VAEs.
Findings
Flexible aggregation schemes outperform traditional PoE and MoE methods.
Tighter variational objectives improve joint distribution approximation.
Permutation-invariant encoders enhance multi-modal data modeling.
Abstract
Devising deep latent variable models for multi-modal data has been a long-standing theme in machine learning research. Multi-modal Variational Autoencoders (VAEs) have been a popular generative model class that learns latent representations that jointly explain multiple modalities. Various objective functions for such models have been suggested, often motivated as lower bounds on the multi-modal data log-likelihood or from information-theoretic considerations. To encode latent variables from different modality subsets, Product-of-Experts (PoE) or Mixture-of-Experts (MoE) aggregation schemes have been routinely used and shown to yield different trade-offs, for instance, regarding their generative quality or consistency across multiple modalities. In this work, we consider a variational objective that can tightly approximate the data log-likelihood. We develop more flexible aggregation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Computational and Text Analysis Methods · Topic Modeling
