CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology
Yuxuan Sun, Yixuan Si, Chenglu Zhu, Xuan Gong, Kai Zhang, Pingyi Chen,, Ye Zhang, Zhongyi Shui, Tao Lin, Lin Yang

TL;DR
CPath-Omni is a large multimodal foundation model that unifies patch and whole-slide image analysis in pathology, achieving state-of-the-art results across multiple tasks and datasets.
Contribution
It introduces the first 15-billion-parameter model that consolidates patch and WSI analysis, and develops a novel CLIP-based visual processor for pathology.
Findings
Achieves SOTA on 39 out of 42 datasets across seven tasks.
Outperforms or matches task-specific models.
First to integrate diverse vision models with a large language model in pathology.
Abstract
The emergence of large multimodal models (LMMs) has brought significant advancements to pathology. Previous research has primarily focused on separately training patch-level and whole-slide image (WSI)-level models, limiting the integration of learned knowledge across patches and WSIs, and resulting in redundant models. In this work, we introduce CPath-Omni, the first 15-billion-parameter LMM designed to unify both patch and WSI level image analysis, consolidating a variety of tasks at both levels, including classification, visual question answering, captioning, and visual referring prompting. Extensive experiments demonstrate that CPath-Omni achieves state-of-the-art (SOTA) performance across seven diverse tasks on 39 out of 42 datasets, outperforming or matching task-specific models trained for individual tasks. Additionally, we develop a specialized pathology CLIP-based visual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in cancer detection · Radiomics and Machine Learning in Medical Imaging · Digital Imaging for Blood Diseases
MethodsContrastive Language-Image Pre-training
