XGeM: A Multi-Prompt Foundation Model for Multimodal Medical Data Generation
Daniele Molino, Francesco Di Feola, Eliodoro Faiella, Deborah Fazzini, Domiziana Santucci, Linlin Shen, Valerio Guarrasi, Paolo Soda

TL;DR
XGeM is a large multimodal generative model that enables flexible, multi-input, multi-output synthesis of medical data, addressing data scarcity and privacy issues in medical imaging.
Contribution
It introduces a novel multi-prompt training strategy and shared latent space for joint multimodal medical data synthesis, advancing beyond unimodal approaches.
Findings
Outperforms five competitors on MIMIC-CXR dataset
Achieves realistic, clinically relevant data generation validated by radiologists
Supports medical data challenges like anonymization and class imbalance
Abstract
The adoption of Artificial Intelligence in medical imaging holds great promise, yet it remains hindered by challenges such as data scarcity, privacy concerns, and the need for robust multimodal integration. While recent advances in generative modeling have enabled high-quality synthetic data generation, existing approaches are often limited to unimodal, unidirectional synthesis and therefore lack the ability to jointly synthesize multiple modalities while preserving clinical consistency. To address this challenge, we introduce XGeM, a 6.77-billion-parameter multimodal generative model designed to support flexible, any-to-any synthesis between medical data modalities. XGeM constructs a shared latent space via contrastive learning and introduces a novel Multi-Prompt Training strategy, enabling conditioning on arbitrary subsets of input modalities. This design allows the model to adapt to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDiffusion · Contrastive Learning
