XBench: A Comprehensive Benchmark for Visual-Language Explanations in Chest Radiography
Haozhe Luo, Shelley Zixin Shu, Ziyu Zhou, Sebastian Otalora, Mauricio Reyes

TL;DR
XBench is a new benchmark that evaluates how well vision-language models align textual explanations with visual evidence in chest X-ray images, revealing strengths and limitations for clinical interpretability.
Contribution
This work introduces the first systematic benchmark for assessing cross-modal interpretability of VLMs in chest radiography, including evaluation methods and analysis of model performance.
Findings
Models perform well on large, well-defined pathologies but poorly on small or diffuse lesions.
Pretraining on chest X-ray datasets improves model alignment with radiologist annotations.
Recognition ability correlates strongly with grounding performance.
Abstract
Vision-language models (VLMs) have recently shown remarkable zero-shot performance in medical image understanding, yet their grounding ability, the extent to which textual concepts align with visual evidence, remains underexplored. In the medical domain, however, reliable grounding is essential for interpretability and clinical adoption. In this work, we present the first systematic benchmark for evaluating cross-modal interpretability in chest X-rays across seven CLIP-style VLM variants. We generate visual explanations using cross-attention and similarity-based localization maps, and quantitatively assess their alignment with radiologist-annotated regions across multiple pathologies. Our analysis reveals that: (1) while all VLM variants demonstrate reasonable localization for large and well-defined pathologies, their performance substantially degrades for small or diffuse lesions; (2)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications
