XGeM: A Multi-Prompt Foundation Model for Multimodal Medical Data Generation

Daniele Molino; Francesco Di Feola; Eliodoro Faiella; Deborah Fazzini; Domiziana Santucci; Linlin Shen; Valerio Guarrasi; Paolo Soda

arXiv:2501.04614·cs.AI·February 16, 2026

XGeM: A Multi-Prompt Foundation Model for Multimodal Medical Data Generation

Daniele Molino, Francesco Di Feola, Eliodoro Faiella, Deborah Fazzini, Domiziana Santucci, Linlin Shen, Valerio Guarrasi, Paolo Soda

PDF

TL;DR

XGeM is a large multimodal generative model that enables flexible, multi-input, multi-output synthesis of medical data, addressing data scarcity and privacy issues in medical imaging.

Contribution

It introduces a novel multi-prompt training strategy and shared latent space for joint multimodal medical data synthesis, advancing beyond unimodal approaches.

Findings

01

Outperforms five competitors on MIMIC-CXR dataset

02

Achieves realistic, clinically relevant data generation validated by radiologists

03

Supports medical data challenges like anonymization and class imbalance

Abstract

The adoption of Artificial Intelligence in medical imaging holds great promise, yet it remains hindered by challenges such as data scarcity, privacy concerns, and the need for robust multimodal integration. While recent advances in generative modeling have enabled high-quality synthetic data generation, existing approaches are often limited to unimodal, unidirectional synthesis and therefore lack the ability to jointly synthesize multiple modalities while preserving clinical consistency. To address this challenge, we introduce XGeM, a 6.77-billion-parameter multimodal generative model designed to support flexible, any-to-any synthesis between medical data modalities. XGeM constructs a shared latent space via contrastive learning and introduces a novel Multi-Prompt Training strategy, enabling conditioning on arbitrary subsets of input modalities. This design allows the model to adapt to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsDiffusion · Contrastive Learning