MedXplain-VQA: Multi-Component Explainable Medical Visual Question Answering

Hai-Dang Nguyen; Minh-Anh Dang; Minh-Tan Le; Minh-Tuan Le

arXiv:2510.22803·cs.CV·October 28, 2025

MedXplain-VQA: Multi-Component Explainable Medical Visual Question Answering

Hai-Dang Nguyen, Minh-Anh Dang, Minh-Tan Le, Minh-Tuan Le

PDF

TL;DR

MedXplain-VQA is an explainable medical visual question answering system that combines multiple AI components to provide interpretable, clinically relevant explanations for medical image analysis, demonstrating significant improvements over baseline methods.

Contribution

The paper introduces MedXplain-VQA, a novel multi-component framework integrating explainability techniques and clinical assessments for medical VQA, advancing transparency and trust in AI diagnostics.

Findings

01

Achieved a composite score of 0.683 on clinical relevance metrics.

02

Generated structured explanations averaging 57 words with clinical terminology.

03

Identified 3-5 diagnostically relevant regions per sample.

Abstract

Explainability is critical for the clinical adoption of medical visual question answering (VQA) systems, as physicians require transparent reasoning to trust AI-generated diagnoses. We present MedXplain-VQA, a comprehensive framework integrating five explainable AI components to deliver interpretable medical image analysis. The framework leverages a fine-tuned BLIP-2 backbone, medical query reformulation, enhanced Grad-CAM attention, precise region extraction, and structured chain-of-thought reasoning via multi-modal language models. To evaluate the system, we introduce a medical-domain-specific framework replacing traditional NLP metrics with clinically relevant assessments, including terminology coverage, clinical structure quality, and attention region relevance. Experiments on 500 PathVQA histopathology samples demonstrate substantial improvements, with the enhanced system achieving…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.