Multimodal synthesis of MRI and tabular data with diffusion in a joint latent space via cross-attention
Daniel Mensing, Jan Kapar, Jochen G. Hirsch, Matthias G\"unther, Horst Hahn, Marvin N. Wright

TL;DR
This paper introduces a novel multimodal latent diffusion model that jointly synthesizes MRI and tabular clinical data within a shared latent space, enabling coherent and high-fidelity generation of multimodal healthcare data.
Contribution
It is the first to demonstrate joint modeling of MRI and mixed-type tabular data using a diffusion framework with a shared latent space, advancing multimodal generative modeling in healthcare.
Findings
Generated MRI volumes showed anatomical plausibility and consistency with tabular data.
Model outperformed CTGAN and matched TVAE in tabular data synthesis.
Quantitative metrics confirmed high-fidelity multimodal data generation.
Abstract
We propose a multimodal latent diffusion model that jointly synthesizes volumetric magnetic resonance imaging (MRI) and tabular clinical data within a shared latent space via cross-attention. This approach enables coherent joint representation learning of MRI and tabular modalities for generative modeling. Our model utilizes a variational autoencoder to fuse the two modalities before diffusion-based synthesis, allowing modality-appropriate reconstruction with separate decoders for MRI and tabular data. We evaluated the framework on data from the German National Cohort (NAKO Gesundheitsstudie), comprising over 10,000 participants with MRI scans and clinical tabular features such as age, sex, body measurements, and ethnicity. The generated MRI volumes exhibited anatomical plausibility and body composition consistent with the synthesized tabular attributes. Quantitative evaluation using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
