Multimodal synthesis of MRI and tabular data with diffusion in a joint latent space via cross-attention

Daniel Mensing; Jan Kapar; Jochen G. Hirsch; Matthias G\"unther; Horst Hahn; Marvin N. Wright

arXiv:2605.06699·eess.IV·May 11, 2026

Multimodal synthesis of MRI and tabular data with diffusion in a joint latent space via cross-attention

Daniel Mensing, Jan Kapar, Jochen G. Hirsch, Matthias G\"unther, Horst Hahn, Marvin N. Wright

PDF

TL;DR

This paper introduces a novel multimodal latent diffusion model that jointly synthesizes MRI and tabular clinical data within a shared latent space, enabling coherent and high-fidelity generation of multimodal healthcare data.

Contribution

It is the first to demonstrate joint modeling of MRI and mixed-type tabular data using a diffusion framework with a shared latent space, advancing multimodal generative modeling in healthcare.

Findings

01

Generated MRI volumes showed anatomical plausibility and consistency with tabular data.

02

Model outperformed CTGAN and matched TVAE in tabular data synthesis.

03

Quantitative metrics confirmed high-fidelity multimodal data generation.

Abstract

We propose a multimodal latent diffusion model that jointly synthesizes volumetric magnetic resonance imaging (MRI) and tabular clinical data within a shared latent space via cross-attention. This approach enables coherent joint representation learning of MRI and tabular modalities for generative modeling. Our model utilizes a variational autoencoder to fuse the two modalities before diffusion-based synthesis, allowing modality-appropriate reconstruction with separate decoders for MRI and tabular data. We evaluated the framework on data from the German National Cohort (NAKO Gesundheitsstudie), comprising over 10,000 participants with MRI scans and clinical tabular features such as age, sex, body measurements, and ethnicity. The generated MRI volumes exhibited anatomical plausibility and body composition consistent with the synthesized tabular attributes. Quantitative evaluation using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.