XBench: A Comprehensive Benchmark for Visual-Language Explanations in Chest Radiography

Haozhe Luo; Shelley Zixin Shu; Ziyu Zhou; Sebastian Otalora; Mauricio Reyes

arXiv:2510.19599·cs.CV·October 23, 2025

XBench: A Comprehensive Benchmark for Visual-Language Explanations in Chest Radiography

Haozhe Luo, Shelley Zixin Shu, Ziyu Zhou, Sebastian Otalora, Mauricio Reyes

PDF

Open Access

TL;DR

XBench is a new benchmark that evaluates how well vision-language models align textual explanations with visual evidence in chest X-ray images, revealing strengths and limitations for clinical interpretability.

Contribution

This work introduces the first systematic benchmark for assessing cross-modal interpretability of VLMs in chest radiography, including evaluation methods and analysis of model performance.

Findings

01

Models perform well on large, well-defined pathologies but poorly on small or diffuse lesions.

02

Pretraining on chest X-ray datasets improves model alignment with radiologist annotations.

03

Recognition ability correlates strongly with grounding performance.

Abstract

Vision-language models (VLMs) have recently shown remarkable zero-shot performance in medical image understanding, yet their grounding ability, the extent to which textual concepts align with visual evidence, remains underexplored. In the medical domain, however, reliable grounding is essential for interpretability and clinical adoption. In this work, we present the first systematic benchmark for evaluating cross-modal interpretability in chest X-rays across seven CLIP-style VLM variants. We generate visual explanations using cross-attention and similarity-based localization maps, and quantitatively assess their alignment with radiologist-annotated regions across multiple pathologies. Our analysis reveals that: (1) while all VLM variants demonstrate reasonable localization for large and well-defined pathologies, their performance substantially degrades for small or diffuse lesions; (2)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCOVID-19 diagnosis using AI · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications