Navigating Gigapixel Pathology Images with Large Multimodal Models

Thomas A. Buckley; Kian R. Weihrauch; Katherine Latham; Andrew Z. Zhou; Padmini A. Manrai; Arjun K. Manrai

arXiv:2511.19652·cs.CV·November 26, 2025

Navigating Gigapixel Pathology Images with Large Multimodal Models

Thomas A. Buckley, Kian R. Weihrauch, Katherine Latham, Andrew Z. Zhou, Padmini A. Manrai, Arjun K. Manrai

PDF

Open Access

TL;DR

This paper introduces GIANT, a framework enabling large multimodal models to navigate gigapixel pathology images iteratively, significantly improving performance on complex clinical questions compared to traditional methods.

Contribution

We developed GIANT, the first system allowing LMMs to navigate whole-slide images interactively, and released MultiPathQA, a new benchmark for pathology reasoning tasks.

Findings

01

GIANT outperforms patch- and thumbnail-based baselines.

02

GPT-5 with GIANT achieves 62.5% accuracy on pathologist-authored questions.

03

Performance approaches or surpasses specialized pathology models.

Abstract

Despite being widely used to support clinical care, general-purpose large multimodal models (LMMs) have generally shown poor or inconclusive performance in medical image interpretation, particularly in pathology, where gigapixel images are used. However, prior studies have used either low-resolution thumbnails or random patches, which likely underestimated model performance. Here, we ask whether LMMs can be adapted to reason coherently and accurately in the evaluation of such images. In this study, we introduce Gigapixel Image Agent for Navigating Tissue (GIANT), the first framework that allows LMMs to iteratively navigate whole-slide images (WSIs) like a pathologist. Accompanying GIANT, we release MultiPathQA, a new benchmark, which comprises 934 WSI-level questions, encompassing five clinically-relevant tasks ranging from cancer diagnosis to open-ended reasoning. MultiPathQA also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI in cancer detection · Digital Imaging for Blood Diseases · Multimodal Machine Learning Applications