Rethinking Visual Attribution for Chest X-ray Reasoning in Large Vision Language Models

Guangzhi Xiong; Qiao Jin; Sanchit Sinha; Zhiyong Lu; Aidong Zhang

arXiv:2605.20158·cs.CV·May 20, 2026

Rethinking Visual Attribution for Chest X-ray Reasoning in Large Vision Language Models

Guangzhi Xiong, Qiao Jin, Sanchit Sinha, Zhiyong Lu, Aidong Zhang

PDF

1 Repo

TL;DR

This paper evaluates the faithfulness of visual attribution methods in chest X-ray reasoning by developing a causal framework and proposing MedFocus, a new concept-based attribution method that improves trustworthiness.

Contribution

It introduces a causal evaluation framework for visual attribution in medical LVLMs and proposes MedFocus, a novel method that enhances explanation accuracy and clinical relevance.

Findings

01

Existing attribution methods often fail to identify the true evidence used by LVLMs.

02

MedFocus significantly outperforms prior attribution methods in localizing clinically meaningful regions.

03

The framework and MedFocus improve the trustworthiness of model explanations in medical imaging.

Abstract

Large Vision Language Models (LVLMs) show promise in medical applications, but their inability to faithfully ground responses in visual evidence raises serious concerns about clinical trustworthiness. While visual attribution methods are widely used to explain LVLM predictions, whether these explanations actually reflect the visual evidence underlying the model's decision is largely unverified, since ground-truth annotations for internal model reasoning are typically unavailable. We address this question for chest X-ray (CXR) reasoning by developing a causal evaluation framework that retains only CXR-VQA samples for which the expert-annotated region is verified, via counterfactual editing, to be causally responsible for the model's prediction. Using this framework across 11 attribution methods, six open-source LVLMs, and two output modes (direct answer and step-by-step reasoning), we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gzxiong/medfocus
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.