Towards Faithful Neural Network Intrinsic Interpretation with Shapley Additive Self-Attribution
Ying Sun, Hengshu Zhu, Hui Xiong

TL;DR
This paper introduces SASANet, a neural network model with a theoretical foundation ensuring its self-attributions match Shapley values, leading to more faithful and efficient interpretability.
Contribution
It proposes the SASANet framework that guarantees self-attribution values align with Shapley values, enhancing interpretability with theoretical guarantees and improved performance.
Findings
SASANet outperforms existing self-attributing models.
SASANet rivals black-box models in accuracy.
SASANet provides more precise and efficient interpretations.
Abstract
Self-interpreting neural networks have garnered significant interest in research. Existing works in this domain often (1) lack a solid theoretical foundation ensuring genuine interpretability or (2) compromise model expressiveness. In response, we formulate a generic Additive Self-Attribution (ASA) framework. Observing the absence of Shapley value in Additive Self-Attribution, we propose Shapley Additive Self-Attributing Neural Network (SASANet), with theoretical guarantees for the self-attribution value equal to the output's Shapley values. Specifically, SASANet uses a marginal contribution-based sequential schema and internal distillation-based training strategies to model meaningful outputs for any number of features, resulting in un-approximated meaningful value function. Our experimental results indicate SASANet surpasses existing self-attributing models in performance and rivals…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Adversarial Robustness in Machine Learning
