Towards Faithful Neural Network Intrinsic Interpretation with Shapley   Additive Self-Attribution

Ying Sun; Hengshu Zhu; Hui Xiong

arXiv:2309.15559·cs.LG·September 28, 2023

Towards Faithful Neural Network Intrinsic Interpretation with Shapley Additive Self-Attribution

Ying Sun, Hengshu Zhu, Hui Xiong

PDF

Open Access

TL;DR

This paper introduces SASANet, a neural network model with a theoretical foundation ensuring its self-attributions match Shapley values, leading to more faithful and efficient interpretability.

Contribution

It proposes the SASANet framework that guarantees self-attribution values align with Shapley values, enhancing interpretability with theoretical guarantees and improved performance.

Findings

01

SASANet outperforms existing self-attributing models.

02

SASANet rivals black-box models in accuracy.

03

SASANet provides more precise and efficient interpretations.

Abstract

Self-interpreting neural networks have garnered significant interest in research. Existing works in this domain often (1) lack a solid theoretical foundation ensuring genuine interpretability or (2) compromise model expressiveness. In response, we formulate a generic Additive Self-Attribution (ASA) framework. Observing the absence of Shapley value in Additive Self-Attribution, we propose Shapley Additive Self-Attributing Neural Network (SASANet), with theoretical guarantees for the self-attribution value equal to the output's Shapley values. Specifically, SASANet uses a marginal contribution-based sequential schema and internal distillation-based training strategies to model meaningful outputs for any number of features, resulting in un-approximated meaningful value function. Our experimental results indicate SASANet surpasses existing self-attributing models in performance and rivals…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Adversarial Robustness in Machine Learning