CoVAE: correlated multimodal generative modeling
Federico Caretti, Guido Sanguinetti

TL;DR
CoVAE introduces a novel generative model that captures correlations between modalities in multimodal data, improving reconstruction accuracy and uncertainty quantification compared to existing methods.
Contribution
This work presents CoVAE, a new architecture that preserves joint statistical structure in multimodal data, addressing limitations of previous latent space fusion strategies.
Findings
Accurate cross-modal reconstruction demonstrated.
Effective uncertainty quantification achieved.
Works on both real and synthetic datasets.
Abstract
Multimodal Variational Autoencoders have emerged as a popular tool to extract effective representations from rich multimodal data. However, such models rely on fusion strategies in latent space that destroy the joint statistical structure of the multimodal data, with profound implications for generation and uncertainty quantification. In this work, we introduce Correlated Variational Autoencoders (CoVAE), a new generative architecture that captures the correlations between modalities. We test CoVAE on a number of real and synthetic data sets demonstrating both accurate cross-modal reconstruction and effective quantification of the associated uncertainties.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Tensor decomposition and applications · Face recognition and analysis
