A Generative Foundation Model for Multimodal Histopathology

Jinxi Xiang; Mingjie Li; Siyu Hou; Yijiang Chen; Xiangde Luo; Yuanfeng Ji; Xiang Zhou; Ehsan Adeli; Akshay Chaudhari; Curtis P. Langlotz; Kilian M. Pohl; Ruijiang Li

arXiv:2604.03635·cs.CV·April 7, 2026

A Generative Foundation Model for Multimodal Histopathology

Jinxi Xiang, Mingjie Li, Siyu Hou, Yijiang Chen, Xiangde Luo, Yuanfeng Ji, Xiang Zhou, Ehsan Adeli, Akshay Chaudhari, Curtis P. Langlotz, Kilian M. Pohl, Ruijiang Li

PDF

TL;DR

MuPD is a large-scale generative model that integrates histology, molecular, and clinical data to enable diverse, high-fidelity multimodal tissue synthesis and translation with minimal fine-tuning.

Contribution

This work introduces MuPD, a unified diffusion transformer model pretrained on extensive multimodal pathology data, enabling versatile cross-modal synthesis and translation tasks.

Findings

01

Reduces FID scores by 50% in text-conditioned histology generation.

02

Improves few-shot classification accuracy by up to 47% with synthetic data.

03

Enhances marker correlation by 37% in virtual staining.

Abstract

Accurate diagnosis and treatment of complex diseases require integrating histological, molecular, and clinical data, yet in practice these modalities are often incomplete owing to tissue scarcity, assay cost, and workflow constraints. Existing computational approaches attempt to impute missing modalities from available data but rely on task-specific models trained on narrow, single source-target pairs, limiting their generalizability. Here we introduce MuPD (Multimodal Pathology Diffusion), a generative foundation model that embeds hematoxylin and eosin (H&E)-stained histology, molecular RNA profiles, and clinical text into a shared latent space through a diffusion transformer with decoupled cross-modal attention. Pretrained on 100 million histology image patches, 1.6 million text-histology pairs, and 10.8 million RNA-histology pairs spanning 34 human organs, MuPD supports diverse…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.