Learning Multimodal Latent Space with EBM Prior and MCMC Inference

Shiyu Yuan; Carlo Lipizzi; Tian Han

arXiv:2408.10467·cs.LG·August 21, 2024

Learning Multimodal Latent Space with EBM Prior and MCMC Inference

Shiyu Yuan, Carlo Lipizzi, Tian Han

PDF

Open Access

TL;DR

This paper introduces a novel multimodal generative modeling approach that combines an expressive energy-based model prior with MCMC inference in the latent space, leading to improved cross-modal generation and better modeling of complex multimodal data.

Contribution

It proposes integrating an EBM prior with MCMC inference in the latent space to enhance multimodal generative modeling, which is a novel combination for this purpose.

Findings

01

EBM prior improves the expressiveness of multimodal models

02

MCMC inference with Langevin dynamics refines the posterior approximation

03

Experimental results show enhanced cross-modal and joint generation quality

Abstract

Multimodal generative models are crucial for various applications. We propose an approach that combines an expressive energy-based model (EBM) prior with Markov Chain Monte Carlo (MCMC) inference in the latent space for multimodal generation. The EBM prior acts as an informative guide, while MCMC inference, specifically through short-run Langevin dynamics, brings the posterior distribution closer to its true form. This method not only provides an expressive prior to better capture the complexity of multimodality but also improves the learning of shared latent variables for more coherent generation across modalities. Our proposed method is supported by empirical experiments, underscoring the effectiveness of our EBM prior with MCMC inference in enhancing cross-modal and joint generative tasks in multimodal contexts.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis

Methodsenergy-based model