PathVLM-Eval: Evaluation of open vision language models in histopathology
Nauman Ullah Gilal, Rachida Zegour, Khaled Al-Thelaya, Erdener Özer, Marco Agus, Jens Schneider, Sabri Boughorbel

TL;DR
This paper evaluates vision language models on histopathology tasks using a specialized benchmark to improve medical diagnosis and training.
Contribution
The paper introduces an extensive benchmark and evaluation framework for VLMs in histopathology, testing over 60 models.
Findings
Qwen2-VL-72B-Instruct achieved the highest average score of 63.97% across all PathMMU subsets.
The evaluation covers diverse histopathology datasets like PubMed, SocialPath, and EduContent.
The study provides a contamination-free assessment of VLMs in a medical imaging context.
Abstract
The emerging trend of vision language models (VLMs) has introduced a new paradigm in artificial intelligence (AI). However, their evaluation has predominantly focused on general-purpose datasets, providing a limited understanding of their effectiveness in specialized domains. Medical imaging, particularly digital pathology, could significantly benefit from VLMs for histological interpretation and diagnosis, enabling pathologists to use a complementary tool for faster morecomprehensive reporting and efficient healthcare service. In this work, we are interested in benchmarking VLMs on histopathology image understanding. We present an extensive evaluation of recent VLMs on the PathMMU dataset, a domain-specific benchmark that includes subsets such as PubMed, SocialPath, and EduContent. These datasets feature diverse formats, notably multiple-choice questions (MCQs), designed to aid…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in cancer detection · Multimodal Machine Learning Applications · Digital Imaging for Blood Diseases
