Explanation-Aware Learning for Enhanced Interpretability in Biomedical Imaging
Zubair Faruqui, Rahul Dubey

TL;DR
This paper introduces a method to incorporate explanation supervision into training deep neural networks for biomedical imaging, improving interpretability without sacrificing accuracy.
Contribution
It systematically analyzes how different explanation loss designs affect model interpretability and performance in medical image diagnosis.
Findings
Explanation supervision improves alignment of model explanations with clinical regions.
There is a trade-off between explanation quality and the strength of explanation loss.
The framework maintains predictive accuracy while enhancing interpretability.
Abstract
Deep neural networks for medical image diagnosis often achieve high predictive accuracy while relying on spurious or clinically irrelevant visual cues, limiting their trustworthiness in practice. Post-hoc explanation methods are widely used to visualize model decisions in the form of saliency maps; however, these explanations do not influence how models learn during training, allowing non-causal or confounding features to persist. This motivates the incorporation of explanation supervision directly into the training objective to guide model attention toward clinically meaningful regions and promote clinically grounded decision-making. This paper presents a systematic approach to integrate explanation loss into model training and analyzes how different explanation loss designs and supervision strengths influence both predictive performance and spatial faithfulness of explanations. To…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
