Generative Counterfactual Introspection for Explainable Deep Learning

Shusen Liu; Bhavya Kailkhura; Donald Loveland; Yong Han

arXiv:1907.03077·cs.LG·July 9, 2019·32 cites

Generative Counterfactual Introspection for Explainable Deep Learning

Shusen Liu, Bhavya Kailkhura, Donald Loveland, Yong Han

PDF

Open Access

TL;DR

This paper introduces a generative-based introspection method for deep neural networks that enables counterfactual analysis by editing input images to understand model predictions, demonstrated on MNIST and CelebA datasets.

Contribution

It presents a novel generative counterfactual introspection technique for interpreting deep learning models through input editing for counterfactual reasoning.

Findings

01

Revealed properties of classifiers via input editing.

02

Applied method successfully on MNIST and CelebA datasets.

03

Provided insights into model decision boundaries.

Abstract

In this work, we propose an introspection technique for deep neural networks that relies on a generative model to instigate salient editing of the input image for model interpretation. Such modification provides the fundamental interventional operation that allows us to obtain answers to counterfactual inquiries, i.e., what meaningful change can be made to the input image in order to alter the prediction. We demonstrate how to reveal interesting properties of the given classifiers by utilizing the proposed introspection approach on both the MNIST and the CelebA dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Generative Adversarial Networks and Image Synthesis · Machine Learning in Healthcare