PathAlign: A vision-language model for whole slide images in   histopathology

Faruk Ahmed; Andrew Sellergren; Lin Yang; Shawn Xu; Boris Babenko,; Abbi Ward; Niels Olson; Arash Mohtashamian; Yossi Matias; Greg S. Corrado,; Quang Duong; Dale R. Webster; Shravya Shetty; Daniel Golden; Yun Liu; David; F. Steiner; Ellery Wulczyn

arXiv:2406.19578·cs.CV·July 1, 2024·2 cites

PathAlign: A vision-language model for whole slide images in histopathology

Faruk Ahmed, Andrew Sellergren, Lin Yang, Shawn Xu, Boris Babenko,, Abbi Ward, Niels Olson, Arash Mohtashamian, Yossi Matias, Greg S. Corrado,, Quang Duong, Dale R. Webster, Shravya Shetty, Daniel Golden, Yun Liu, David, F. Steiner, Ellery Wulczyn

PDF

Open Access 9 Models

TL;DR

PathAlign introduces a vision-language model for whole slide images in histopathology, enabling retrieval, report generation, and classification by leveraging large-scale image-text pairs from pathology reports.

Contribution

This work develops a novel vision-language model for WSIs using curated pathology report text, enabling retrieval and generative applications without region annotations.

Findings

01

PathAlign achieves 78% accuracy in pathologist-rated text generation.

02

Model enables effective slide retrieval and classification.

03

Demonstrates potential for language-aligned WSI embeddings.

Abstract

Microscopic interpretation of histopathology images underlies many important diagnostic and treatment decisions. While advances in vision-language modeling raise new opportunities for analysis of such images, the gigapixel-scale size of whole slide images (WSIs) introduces unique challenges. Additionally, pathology reports simultaneously highlight key findings from small regions while also aggregating interpretation across multiple slides, often making it difficult to create robust image-text pairs. As such, pathology reports remain a largely untapped source of supervision in computational pathology, with most efforts relying on region-of-interest annotations or self-supervision at the patch-level. In this work, we develop a vision-language model based on the BLIP-2 framework using WSIs paired with curated text from pathology reports. This enables applications utilizing a shared…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI in cancer detection · Digital Imaging for Blood Diseases