PathGLS: Evaluating Pathology Vision-Language Models without Ground Truth through Multi-Dimensional Consistency

Minbing Chen; Zhu Meng; Fei Su

arXiv:2603.16113·cs.CV·March 18, 2026

PathGLS: Evaluating Pathology Vision-Language Models without Ground Truth through Multi-Dimensional Consistency

Minbing Chen, Zhu Meng, Fei Su

PDF

Open Access

TL;DR

PathGLS introduces a multi-dimensional, reference-free evaluation framework for pathology vision-language models, effectively detecting subtle failures like hallucinations and correlating well with clinical error hierarchies, thus enabling safer deployment.

Contribution

The paper presents PathGLS, a novel multi-dimensional evaluation method for pathology VLMs that does not require ground truth and assesses grounding, logic, and stability.

Findings

01

PathGLS detects a 40.2% sensitivity drop for hallucinations, outperforming BERTScore.

02

It achieves a Spearman's rank correlation of 0.71 with clinical error hierarchies.

03

PathGLS outperforms LLM-based approaches in robustness and reliability.

Abstract

Vision-Language Models (VLMs) offer significant potential in computational pathology by enabling interpretable image analysis, automated reporting, and scalable decision support. However, their widespread clinical adoption remains limited due to the absence of reliable, automated evaluation metrics capable of identifying subtle failures such as hallucinations. To address this gap, we propose PathGLS, a novel reference-free evaluation framework that assesses pathology VLMs across three dimensions: Grounding (fine-grained visual-text alignment), Logic (entailment graph consistency using Natural Language Inference), and Stability (output variance under adversarial visual-semantic perturbations). PathGLS supports both patch-level and whole-slide image (WSI)-level analysis, yielding a comprehensive trust score. Experiments on Quilt-1M, TCGA, REG2025, PathMMU and TCGA-Sarcoma datasets…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · COVID-19 diagnosis using AI · Adversarial Robustness in Machine Learning