A Versatile Pathology Co-pilot via Reasoning Enhanced Multimodal Large Language Model

Zhe Xu; Ziyi Liu; Junlin Hou; Jiabo Ma; Cheng Jin; Yihui Wang; Zhixuan Chen; Zhengyu Zhang; Fuxiang Huang; Zhengrui Guo; Fengtao Zhou; Yingxue Xu; Xi Wang; Ronald Cheong Kin Chan; Li Liang; Hao Chen

arXiv:2507.17303·eess.IV·August 20, 2025

A Versatile Pathology Co-pilot via Reasoning Enhanced Multimodal Large Language Model

Zhe Xu, Ziyi Liu, Junlin Hou, Jiabo Ma, Cheng Jin, Yihui Wang, Zhixuan Chen, Zhengyu Zhang, Fuxiang Huang, Zhengrui Guo, Fengtao Zhou, Yingxue Xu, Xi Wang, Ronald Cheong Kin Chan, Li Liang, Hao Chen

PDF

Open Access

TL;DR

This paper introduces SmartPath-R1, a versatile multimodal large language model for pathology that can perform multiple diagnostic tasks with enhanced reasoning, eliminating the need for expensive annotations and addressing a broad spectrum of clinical pathology applications.

Contribution

The study presents a novel reasoning-enhanced multimodal LLM that handles diverse pathology tasks without chain-of-thought supervision, using scale-dependent fine-tuning and a mixture-of-experts mechanism.

Findings

01

Outperforms existing models on 72 pathology tasks

02

Handles both ROI-level and WSI-level analyses effectively

03

Achieves robust reasoning without chain-of-thought annotations

Abstract

Multimodal large language models (MLLMs) have emerged as powerful tools for computational pathology, offering unprecedented opportunities to integrate pathological images with language context for comprehensive diagnostic analysis. These models hold particular promise for automating complex tasks that traditionally require expert interpretation of pathologists. However, current MLLM approaches in pathology demonstrate significantly constrained reasoning capabilities, primarily due to their reliance on expensive chain-of-thought annotations. Additionally, existing methods remain limited to simplex application of visual question answering (VQA) at the region-of-interest (ROI) level, failing to address the full spectrum of diagnostic needs such as ROI classification, detection, segmentation, whole-slide-image (WSI) classification and VQA in clinical practice. In this study, we present…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling