Navigating Gigapixel Pathology Images with Large Multimodal Models
Thomas A. Buckley, Kian R. Weihrauch, Katherine Latham, Andrew Z. Zhou, Padmini A. Manrai, Arjun K. Manrai

TL;DR
This paper introduces GIANT, a framework enabling large multimodal models to navigate gigapixel pathology images iteratively, significantly improving performance on complex clinical questions compared to traditional methods.
Contribution
We developed GIANT, the first system allowing LMMs to navigate whole-slide images interactively, and released MultiPathQA, a new benchmark for pathology reasoning tasks.
Findings
GIANT outperforms patch- and thumbnail-based baselines.
GPT-5 with GIANT achieves 62.5% accuracy on pathologist-authored questions.
Performance approaches or surpasses specialized pathology models.
Abstract
Despite being widely used to support clinical care, general-purpose large multimodal models (LMMs) have generally shown poor or inconclusive performance in medical image interpretation, particularly in pathology, where gigapixel images are used. However, prior studies have used either low-resolution thumbnails or random patches, which likely underestimated model performance. Here, we ask whether LMMs can be adapted to reason coherently and accurately in the evaluation of such images. In this study, we introduce Gigapixel Image Agent for Navigating Tissue (GIANT), the first framework that allows LMMs to iteratively navigate whole-slide images (WSIs) like a pathologist. Accompanying GIANT, we release MultiPathQA, a new benchmark, which comprises 934 WSI-level questions, encompassing five clinically-relevant tasks ranging from cancer diagnosis to open-ended reasoning. MultiPathQA also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in cancer detection · Digital Imaging for Blood Diseases · Multimodal Machine Learning Applications
