Bridging visual saliency and large language models for explainable deep learning in medical imaging
Paul Valery Nguezet, Elie Tagne Fute, Yusuf Brima, Benoit Martin Azanguezet, Marcellin Atemkeng

TL;DR
This paper introduces a multimodal explainability framework that combines CNNs, visual saliency, neuroanatomical mapping, and large language models to produce interpretable brain tumor diagnostic reports from MRI images.
Contribution
It presents a novel pipeline integrating visual attribution, neuroanatomical mapping, and LLMs for explainable medical imaging diagnostics.
Findings
InceptionResNetV2 achieved top classification performance.
Grad-CAM++ provided the best segmentation overlap.
Grok3 generated the most coherent diagnostic reports.
Abstract
The opaque nature of deep learning models remains a significant barrier to their clinical adoption in medical imaging. This paper presents a multimodal explainability framework that bridges the gap between convolutional neural network (CNN) predictions and clinically actionable insights for brain tumor classification, leveraging large language models (LLMs) to deliver human-interpretable diagnostic narratives. The proposed framework operates through three coupled stages. First, nine CNN architectures are extended with a dual-output hybrid formulation that simultaneously optimises a classification head and a segmentation head, enabling spatially richer feature learning. Second, visual saliency attribution methods, namely Grad-CAM, Grad-CAM++, and ScoreCAM, are applied to generate class-discriminative heatmaps, which are subsequently refined into binary tumor masks via an adaptive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
