PolyPath: Adapting a Large Multimodal Model for Multi-slide Pathology Report Generation
Faruk Ahmed, Lin Yang, Tiam Jaroensri, Andrew Sellergren, Yossi, Matias, Avinatan Hassidim, Greg S. Corrado, Dale R. Webster, Shravya Shetty,, Shruthi Prabhakara, Yun Liu, Daniel Golden, Ellery Wulczyn, David F. Steiner

TL;DR
This paper introduces PolyPath, a multimodal model that leverages Gemini 1.5 Flash's long context window to generate accurate pathology reports from multiple whole-slide images, significantly advancing multi-slide interpretation.
Contribution
It demonstrates the first large multimodal model capable of integrating thousands of image patches across multiple slides for clinical report generation in pathology.
Findings
Achieved clinically accurate reports for 68% of multi-slide cases
Successfully processed up to 40,000 image patches at high magnification
Performance declined with six or more slides, indicating room for improvement.
Abstract
The interpretation of histopathology cases underlies many important diagnostic and treatment decisions in medicine. Notably, this process typically requires pathologists to integrate and summarize findings across multiple slides per case. Existing vision-language capabilities in computational pathology have so far been largely limited to small regions of interest, larger regions at low magnification, or single whole-slide images (WSIs). This limits interpretation of findings that span multiple high-magnification regions across multiple WSIs. By making use of Gemini 1.5 Flash, a large multimodal model (LMM) with a 1-million token context window, we demonstrate the ability to generate bottom-line diagnoses from up to 40,000 768x768 pixel image patches from multiple WSIs at 10X magnification. This is the equivalent of up to 11 hours of video at 1 fps. Expert pathologist evaluations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Natural Language Processing Techniques
