Learning Multimodal Latent Space with EBM Prior and MCMC Inference
Shiyu Yuan, Carlo Lipizzi, Tian Han

TL;DR
This paper introduces a novel multimodal generative modeling approach that combines an expressive energy-based model prior with MCMC inference in the latent space, leading to improved cross-modal generation and better modeling of complex multimodal data.
Contribution
It proposes integrating an EBM prior with MCMC inference in the latent space to enhance multimodal generative modeling, which is a novel combination for this purpose.
Findings
EBM prior improves the expressiveness of multimodal models
MCMC inference with Langevin dynamics refines the posterior approximation
Experimental results show enhanced cross-modal and joint generation quality
Abstract
Multimodal generative models are crucial for various applications. We propose an approach that combines an expressive energy-based model (EBM) prior with Markov Chain Monte Carlo (MCMC) inference in the latent space for multimodal generation. The EBM prior acts as an informative guide, while MCMC inference, specifically through short-run Langevin dynamics, brings the posterior distribution closer to its true form. This method not only provides an expressive prior to better capture the complexity of multimodality but also improves the learning of shared latent variables for more coherent generation across modalities. Our proposed method is supported by empirical experiments, underscoring the effectiveness of our EBM prior with MCMC inference in enhancing cross-modal and joint generative tasks in multimodal contexts.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
Methodsenergy-based model
