PathAlign: A vision-language model for whole slide images in histopathology
Faruk Ahmed, Andrew Sellergren, Lin Yang, Shawn Xu, Boris Babenko,, Abbi Ward, Niels Olson, Arash Mohtashamian, Yossi Matias, Greg S. Corrado,, Quang Duong, Dale R. Webster, Shravya Shetty, Daniel Golden, Yun Liu, David, F. Steiner, Ellery Wulczyn

TL;DR
PathAlign introduces a vision-language model for whole slide images in histopathology, enabling retrieval, report generation, and classification by leveraging large-scale image-text pairs from pathology reports.
Contribution
This work develops a novel vision-language model for WSIs using curated pathology report text, enabling retrieval and generative applications without region annotations.
Findings
PathAlign achieves 78% accuracy in pathologist-rated text generation.
Model enables effective slide retrieval and classification.
Demonstrates potential for language-aligned WSI embeddings.
Abstract
Microscopic interpretation of histopathology images underlies many important diagnostic and treatment decisions. While advances in vision-language modeling raise new opportunities for analysis of such images, the gigapixel-scale size of whole slide images (WSIs) introduces unique challenges. Additionally, pathology reports simultaneously highlight key findings from small regions while also aggregating interpretation across multiple slides, often making it difficult to create robust image-text pairs. As such, pathology reports remain a largely untapped source of supervision in computational pathology, with most efforts relying on region-of-interest annotations or self-supervision at the patch-level. In this work, we develop a vision-language model based on the BLIP-2 framework using WSIs paired with curated text from pathology reports. This enables applications utilizing a shared…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗google/medgemma-1.5-4b-itmodel· 86k dl· ♡ 53686k dl♡ 536
- 🤗unsloth/medgemma-1.5-4b-it-GGUFmodel· 6.7k dl· ♡ 336.7k dl♡ 33
- 🤗unsloth/medgemma-1.5-4b-itmodel· 3.7k dl· ♡ 53.7k dl♡ 5
- 🤗unsloth/medgemma-1.5-4b-it-unsloth-bnb-4bitmodel· 510 dl· ♡ 2510 dl♡ 2
- 🤗unsloth/medgemma-1.5-4b-it-bnb-4bitmodel· 287 dl· ♡ 3287 dl♡ 3
- 🤗zero0303/medgemma-1.5-4b-itmodel· 613 dl613 dl
- 🤗gabrielbuzzi/medgemma-1.5-4b-itmodel
- 🤗FastFlowLM/medgemma-1.5-4b-it-NPU2model· 113 dl· ♡ 1113 dl♡ 1
- 🤗amewebstudio/medgemma-sickle-cellmodel· 5 dl· ♡ 15 dl♡ 1
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in cancer detection · Digital Imaging for Blood Diseases
