Multimodal Whole Slide Foundation Model for Pathology

Tong Ding; Sophia J. Wagner; Andrew H. Song; Richard J. Chen; Ming Y.; Lu; Andrew Zhang; Anurag J. Vaidya; Guillaume Jaume; Muhammad Shaban; Ahrong; Kim; Drew F.K. Williamson; Bowen Chen; Cristina Almagro-Perez; Paul Doucet,; Sharifa Sahai; Chengkuan Chen; Daisuke Komura; Akihiro Kawabe; Shumpei; Ishikawa; Georg Gerber; Tingying Peng; Long Phi Le; Faisal Mahmood

arXiv:2411.19666·eess.IV·December 2, 2024·23 cites

Multimodal Whole Slide Foundation Model for Pathology

Tong Ding, Sophia J. Wagner, Andrew H. Song, Richard J. Chen, Ming Y., Lu, Andrew Zhang, Anurag J. Vaidya, Guillaume Jaume, Muhammad Shaban, Ahrong, Kim, Drew F.K. Williamson, Bowen Chen, Cristina Almagro-Perez, Paul Doucet,, Sharifa Sahai, Chengkuan Chen, Daisuke Komura

PDF

Open Access 2 Repos 1 Models

TL;DR

This paper introduces TITAN, a multimodal foundation model for pathology that leverages extensive self-supervised and vision-language training on whole slide images and reports, enabling effective clinical task performance without fine-tuning.

Contribution

The paper presents TITAN, a novel multimodal foundation model trained on large-scale pathology data, capable of generalizing to resource-limited clinical scenarios without fine-tuning.

Findings

01

TITAN outperforms existing models in classification and retrieval tasks.

02

TITAN effectively generates pathology reports without clinical labels.

03

TITAN demonstrates strong zero-shot and few-shot learning capabilities.

Abstract

The field of computational pathology has been transformed with recent advances in foundation models that encode histopathology region-of-interests (ROIs) into versatile and transferable feature representations via self-supervised learning (SSL). However, translating these advancements to address complex clinical challenges at the patient and slide level remains constrained by limited clinical data in disease-specific cohorts, especially for rare clinical conditions. We propose TITAN, a multimodal whole slide foundation model pretrained using 335,645 WSIs via visual self-supervised learning and vision-language alignment with corresponding pathology reports and 423,122 synthetic captions generated from a multimodal generative AI copilot for pathology. Without any finetuning or requiring clinical labels, TITAN can extract general-purpose slide representations and generate pathology reports…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
sofieneb/histaug-conch_v15
model· 11 dl
11 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRadiomics and Machine Learning in Medical Imaging · AI in cancer detection