Explaining Deep Neural Networks

Oana-Maria Camburu

arXiv:2010.01496·cs.CL·October 15, 2021·22 cites

Explaining Deep Neural Networks

Oana-Maria Camburu

PDF

Open Access

TL;DR

This paper explores methods for explaining deep neural networks, focusing on feature-based post-hoc explanations and self-explanatory models that generate natural language explanations to improve interpretability.

Contribution

It investigates two major directions for explaining neural networks: post-hoc feature explanations and self-explanatory models with built-in explanation generation.

Findings

01

Analyzes feature-based post-hoc explanation methods.

02

Examines self-explanatory neural models with natural language outputs.

03

Highlights importance of interpretability in critical domains.

Abstract

Deep neural networks are becoming more and more popular due to their revolutionary success in diverse areas, such as computer vision, natural language processing, and speech recognition. However, the decision-making processes of these models are generally not interpretable to users. In various domains, such as healthcare, finance, or law, it is critical to know the reasons behind a decision made by an artificial intelligence system. Therefore, several directions for explaining neural models have recently been explored. In this thesis, I investigate two major directions for explaining deep neural networks. The first direction consists of feature-based post-hoc explanatory methods, that is, methods that aim to explain an already trained and fixed model (post-hoc), and that provide explanations in terms of input features, such as tokens for text and superpixels for images (feature-based).…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Topic Modeling