Interpretable Deep Learning: Interpretation, Interpretability,   Trustworthiness, and Beyond

Xuhong Li; Haoyi Xiong; Xingjian Li; Xuanyu Wu; Xiao Zhang; Ji Liu,; Jiang Bian; Dejing Dou

arXiv:2103.10689·cs.LG·July 18, 2022·29 cites

Interpretable Deep Learning: Interpretation, Interpretability, Trustworthiness, and Beyond

Xuhong Li, Haoyi Xiong, Xingjian Li, Xuanyu Wu, Xiao Zhang, Ji Liu,, Jiang Bian, Dejing Dou

PDF

Open Access 1 Repo

TL;DR

This paper provides a comprehensive survey of interpretability in deep learning, clarifying key concepts, categorizing interpretation algorithms, evaluating their performance, and discussing their connection to model robustness and trustworthiness.

Contribution

It offers a new taxonomy of interpretation algorithms, clarifies core concepts, and reviews evaluation metrics and trustworthiness in deep learning interpretability research.

Findings

01

Proposed a taxonomy for interpretation algorithms

02

Reviewed metrics for evaluating interpretability

03

Discussed the link between interpretability and robustness

Abstract

Deep neural networks have been well-known for their superb handling of various machine learning and artificial intelligence tasks. However, due to their over-parameterized black-box nature, it is often difficult to understand the prediction results of deep models. In recent years, many interpretation tools have been proposed to explain or reveal how deep models make decisions. In this paper, we review this line of research and try to make a comprehensive survey. Specifically, we first introduce and clarify two basic concepts -- interpretations and interpretability -- that people usually get confused about. To address the research efforts in interpretations, we elaborate the designs of a number of interpretation algorithms, from different perspectives, by proposing a new taxonomy. Then, to understand the interpretation results, we also survey the performance metrics for evaluating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

PaddlePaddle/InterpretDL
paddleOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)