Attribution Explanations for Deep Neural Networks: A Theoretical Perspective
Huiqi Deng, Hongbin Pei, Quanshi Zhang, and Mengnan Du

TL;DR
This paper reviews recent theoretical advances in attribution explanations for deep neural networks, addressing challenges in comparing, understanding, and evaluating the faithfulness of attribution methods.
Contribution
It offers a comprehensive summary of theoretical unification, rationale, and evaluation approaches, providing insights for better method selection and development.
Findings
Unification of attribution methods enables systematic comparison.
Theoretical foundations clarify the rationale behind existing methods.
Rigorous evaluation frameworks assess faithfulness of attribution methods.
Abstract
Attribution explanation is a typical approach for explaining deep neural networks (DNNs), inferring an importance or contribution score for each input variable to the final output. In recent years, numerous attribution methods have been developed to explain DNNs. However, a persistent concern remains unresolved, i.e., whether and which attribution methods faithfully reflect the actual contribution of input variables to the decision-making process. The faithfulness issue undermines the reliability and practical utility of attribution explanations. We argue that these concerns stem from three core challenges. First, difficulties arise in comparing attribution methods due to their unstructured heterogeneity, differences in heuristics, formulations, and implementations that lack a unified organization. Second, most methods lack solid theoretical underpinnings, with their rationales…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Advanced Neural Network Applications
