Learning Multimodal Latent Generative Models with Energy-Based Prior
Shiyu Yuan, Jiali Cui, Hanao Li, and Tian Han

TL;DR
This paper introduces a novel multimodal generative model that integrates energy-based models as priors, resulting in more expressive representations and improved cross-modal generation coherence.
Contribution
It proposes a new framework combining multimodal latent models with EBMs, trained jointly via a variational scheme, enhancing prior expressiveness.
Findings
Superior generation coherence demonstrated in experiments
More expressive and informative priors captured across modalities
Effective joint training of multimodal models with EBMs
Abstract
Multimodal generative models have recently gained significant attention for their ability to learn representations across various modalities, enhancing joint and cross-generation coherence. However, most existing works use standard Gaussian or Laplacian distributions as priors, which may struggle to capture the diverse information inherent in multiple data types due to their unimodal and less informative nature. Energy-based models (EBMs), known for their expressiveness and flexibility across various tasks, have yet to be thoroughly explored in the context of multimodal generative models. In this paper, we propose a novel framework that integrates the multimodal latent generative model with the EBM. Both models can be trained jointly through a variational scheme. This approach results in a more expressive and informative prior, better-capturing of information across multiple modalities.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
MethodsSoftmax · Attention Is All You Need · energy-based model
