Enhancing Interpretability for Vision Models via Shapley Value Optimization
Kanglong Fan, Yunqiao Yang, Chen Ma

TL;DR
This paper introduces a self-explaining neural network framework that uses Shapley value estimation during training to improve interpretability without sacrificing performance, outperforming existing explanation methods.
Contribution
A novel self-explaining framework integrating Shapley value estimation as an auxiliary task, ensuring faithful and interpretable model explanations with minimal structural changes.
Findings
Achieves state-of-the-art interpretability on multiple benchmarks.
Maintains high model performance and compatibility.
Provides fair attribution of prediction scores to image patches.
Abstract
Deep neural networks have demonstrated remarkable performance across various domains, yet their decision-making processes remain opaque. Although many explanation methods are dedicated to bringing the obscurity of DNNs to light, they exhibit significant limitations: post-hoc explanation methods often struggle to faithfully reflect model behaviors, while self-explaining neural networks sacrifice performance and compatibility due to their specialized architectural designs. To address these challenges, we propose a novel self-explaining framework that integrates Shapley value estimation as an auxiliary task during training, which achieves two key advancements: 1) a fair allocation of the model prediction scores to image patches, ensuring explanations inherently align with the model's decision logic, and 2) enhanced interpretability with minor structural modifications, preserving model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Advanced Neural Network Applications · Multimodal Machine Learning Applications
