A Generalist Model for Diverse Text-Guided Medical Image Synthesis

Joseph Cho; Mrudang Mathur; Cyril Zakka; Dhamanpreet Kaur; Matthew Leipzig; Alex Dalal; Aravind Krishnan; Eubee Koo; Karen Wai; Cindy S. Zhao; Akshay Chaudhari; Matthew Duda; Ashley Choi; Ehsan Rahimy; Lyna Azzouz; Robyn Fong; Rohan Shad; William Hiesinger

arXiv:2405.09806·cs.CV·April 22, 2026·2 cites

A Generalist Model for Diverse Text-Guided Medical Image Synthesis

Joseph Cho, Mrudang Mathur, Cyril Zakka, Dhamanpreet Kaur, Matthew Leipzig, Alex Dalal, Aravind Krishnan, Eubee Koo, Karen Wai, Cindy S. Zhao, Akshay Chaudhari, Matthew Duda, Ashley Choi, Ehsan Rahimy, Lyna Azzouz, Robyn Fong, Rohan Shad, William Hiesinger

PDF

1 Models

TL;DR

This paper introduces MediSyn, a publicly trained, generalist text-guided medical image synthesis model that generates diverse, realistic images across multiple specialties and modalities, enhancing medical AI research.

Contribution

The paper presents MediSyn, a novel open-access, multi-specialty, multi-modality medical image generator trained solely on public data, demonstrating efficiency and broad applicability.

Findings

01

Training on diverse images does not reduce quality.

02

The generalist model is more efficient than multiple task-specific models.

03

Synthetic images improve classifier performance in data-limited scenarios.

Abstract

Deep learning algorithms require extensive data to achieve robust performance. However, data availability is often restricted in the medical domain due to patient privacy concerns. Synthetic data presents a possible solution to these challenges. Image generative models have found increasing use for medical applications, but are often task-specific, thus limiting their scalability. Moreover, existing models frequently rely on private datasets for training, which constrain their reproducibility. To address this, we introduce MediSyn: an open-access, generalist, text-guided latent diffusion model capable of generating synthetic images across 6 medical specialties and 10 imaging modalities, while being trained exclusively on publicly available data. Through extensive experimentation, we provide several key contributions. First, we demonstrate that training a generative model on visually…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
hiesingerlab/MediSyn
model· 32 dl
32 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.