PixCell: A generative foundation model for digital histopathology images

Srikar Yellapragada; Alexandros Graikos; Zilinghan Li; Kostas Triaridis; Varun Belagali; Tarak Nath Nandi; Karen Bai; Beatrice S. Knudsen; Tahsin Kurc; Rajarsi R. Gupta; Prateek Prasanna; Ravi K Madduri; Joel Saltz; Dimitris Samaras

arXiv:2506.05127·eess.IV·December 4, 2025

PixCell: A generative foundation model for digital histopathology images

Srikar Yellapragada, Alexandros Graikos, Zilinghan Li, Kostas Triaridis, Varun Belagali, Tarak Nath Nandi, Karen Bai, Beatrice S. Knudsen, Tahsin Kurc, Rajarsi R. Gupta, Prateek Prasanna, Ravi K Madduri, Joel Saltz, Dimitris Samaras

PDF

3 Models 2 Datasets

TL;DR

PixCell is a novel generative foundation model for histopathology images that uses diffusion techniques to synthesize realistic images, aiding in data augmentation, privacy preservation, and virtual staining without requiring annotated data.

Contribution

The paper introduces PixCell, the first large-scale, self-supervised diffusion model for histopathology images, enabling diverse image synthesis and downstream applications like virtual staining and privacy-preserving data sharing.

Findings

01

PixCell effectively generates high-fidelity synthetic histopathology images.

02

Synthetic data improves classification performance on small datasets.

03

PixCell enables virtual IHC staining from H&E images.

Abstract

The digitization of histology slides has revolutionized pathology, providing massive datasets for cancer diagnosis and research. Self-supervised and vision-language models have been shown to effectively mine large pathology datasets to learn discriminative representations. On the other hand, there are unique problems in pathology, such as annotated data scarcity, privacy regulations in data sharing, and inherently generative tasks like virtual staining. Generative models, capable of synthesizing realistic and diverse images, present a compelling solution to address these problems through image synthesis. We introduce PixCell, the first generative foundation model for histopathology images. PixCell is a diffusion model trained on PanCan-30M, a large, diverse dataset derived from 69,184 H&E-stained whole slide images of various cancer types. We employ a progressive training strategy and a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSparse Evolutionary Training