Interpretable Deep Learning: Interpretation, Interpretability, Trustworthiness, and Beyond
Xuhong Li, Haoyi Xiong, Xingjian Li, Xuanyu Wu, Xiao Zhang, Ji Liu,, Jiang Bian, Dejing Dou

TL;DR
This paper provides a comprehensive survey of interpretability in deep learning, clarifying key concepts, categorizing interpretation algorithms, evaluating their performance, and discussing their connection to model robustness and trustworthiness.
Contribution
It offers a new taxonomy of interpretation algorithms, clarifies core concepts, and reviews evaluation metrics and trustworthiness in deep learning interpretability research.
Findings
Proposed a taxonomy for interpretation algorithms
Reviewed metrics for evaluating interpretability
Discussed the link between interpretability and robustness
Abstract
Deep neural networks have been well-known for their superb handling of various machine learning and artificial intelligence tasks. However, due to their over-parameterized black-box nature, it is often difficult to understand the prediction results of deep models. In recent years, many interpretation tools have been proposed to explain or reveal how deep models make decisions. In this paper, we review this line of research and try to make a comprehensive survey. Specifically, we first introduce and clarify two basic concepts -- interpretations and interpretability -- that people usually get confused about. To address the research efforts in interpretations, we elaborate the designs of a number of interpretation algorithms, from different perspectives, by proposing a new taxonomy. Then, to understand the interpretation results, we also survey the performance metrics for evaluating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
