Issues with post-hoc counterfactual explanations: a discussion

Thibault Laugel; Marie-Jeanne Lesot; Christophe Marsala and; Marcin Detyniecki

arXiv:1906.04774·cs.LG·June 13, 2019·21 cites

Issues with post-hoc counterfactual explanations: a discussion

Thibault Laugel, Marie-Jeanne Lesot, Christophe Marsala and, Marcin Detyniecki

PDF

Open Access

TL;DR

This paper critically examines the limitations of post-hoc counterfactual explanations for blackbox classifiers, highlighting issues with their assumptions and the importance of properties like proximity, connectedness, and stability.

Contribution

It discusses the key properties necessary for reliable counterfactual explanations and illustrates potential risks when these properties are not satisfied.

Findings

01

Counterfactual explanations often fail to meet desirable properties.

02

There are risks associated with the assumptions made by post-hoc approaches.

03

Ensuring proximity, connectedness, and stability is crucial for reliable explanations.

Abstract

Counterfactual post-hoc interpretability approaches have been proven to be useful tools to generate explanations for the predictions of a trained blackbox classifier. However, the assumptions they make about the data and the classifier make them unreliable in many contexts. In this paper, we discuss three desirable properties and approaches to quantify them: proximity, connectedness and stability. In addition, we illustrate that there is a risk for post-hoc counterfactual approaches to not satisfy these properties.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Anomaly Detection Techniques and Applications · Adversarial Robustness in Machine Learning

MethodsInterpretability