A Generative Foundation Model for Multimodal Histopathology
Jinxi Xiang, Mingjie Li, Siyu Hou, Yijiang Chen, Xiangde Luo, Yuanfeng Ji, Xiang Zhou, Ehsan Adeli, Akshay Chaudhari, Curtis P. Langlotz, Kilian M. Pohl, Ruijiang Li

TL;DR
MuPD is a large-scale generative model that integrates histology, molecular, and clinical data to enable diverse, high-fidelity multimodal tissue synthesis and translation with minimal fine-tuning.
Contribution
This work introduces MuPD, a unified diffusion transformer model pretrained on extensive multimodal pathology data, enabling versatile cross-modal synthesis and translation tasks.
Findings
Reduces FID scores by 50% in text-conditioned histology generation.
Improves few-shot classification accuracy by up to 47% with synthetic data.
Enhances marker correlation by 37% in virtual staining.
Abstract
Accurate diagnosis and treatment of complex diseases require integrating histological, molecular, and clinical data, yet in practice these modalities are often incomplete owing to tissue scarcity, assay cost, and workflow constraints. Existing computational approaches attempt to impute missing modalities from available data but rely on task-specific models trained on narrow, single source-target pairs, limiting their generalizability. Here we introduce MuPD (Multimodal Pathology Diffusion), a generative foundation model that embeds hematoxylin and eosin (H&E)-stained histology, molecular RNA profiles, and clinical text into a shared latent space through a diffusion transformer with decoupled cross-modal attention. Pretrained on 100 million histology image patches, 1.6 million text-histology pairs, and 10.8 million RNA-histology pairs spanning 34 human organs, MuPD supports diverse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
