PathVLM-Eval: Evaluation of open vision language models in histopathology

Nauman Ullah Gilal; Rachida Zegour; Khaled Al-Thelaya; Erdener Özer; Marco Agus; Jens Schneider; Sabri Boughorbel

PMC · DOI:10.1016/j.jpi.2025.100455·June 5, 2025

PathVLM-Eval: Evaluation of open vision language models in histopathology

Nauman Ullah Gilal, Rachida Zegour, Khaled Al-Thelaya, Erdener Özer, Marco Agus, Jens Schneider, Sabri Boughorbel

PDF

Open Access

TL;DR

This paper evaluates vision language models on histopathology tasks using a specialized benchmark to improve medical diagnosis and training.

Contribution

The paper introduces an extensive benchmark and evaluation framework for VLMs in histopathology, testing over 60 models.

Findings

01

Qwen2-VL-72B-Instruct achieved the highest average score of 63.97% across all PathMMU subsets.

02

The evaluation covers diverse histopathology datasets like PubMed, SocialPath, and EduContent.

03

The study provides a contamination-free assessment of VLMs in a medical imaging context.

Abstract

The emerging trend of vision language models (VLMs) has introduced a new paradigm in artificial intelligence (AI). However, their evaluation has predominantly focused on general-purpose datasets, providing a limited understanding of their effectiveness in specialized domains. Medical imaging, particularly digital pathology, could significantly benefit from VLMs for histological interpretation and diagnosis, enabling pathologists to use a complementary tool for faster morecomprehensive reporting and efficient healthcare service. In this work, we are interested in benchmarking VLMs on histopathology image understanding. We present an extensive evaluation of recent VLMs on the PathMMU dataset, a domain-specific benchmark that includes subsets such as PubMed, SocialPath, and EduContent. These datasets feature diverse formats, notably multiple-choice questions (MCQs), designed to aid…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Cell lines1

MolmoE-1B— Homo sapiens (Human) · Childhood B acute lymphoblastic leukemia · Cancer cell line

Chemicals1

MCQs

Diseases4

VL cancerous VLMs MCQs

Figures7

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI in cancer detection · Multimodal Machine Learning Applications · Digital Imaging for Blood Diseases