Issues with post-hoc counterfactual explanations: a discussion
Thibault Laugel, Marie-Jeanne Lesot, Christophe Marsala and, Marcin Detyniecki

TL;DR
This paper critically examines the limitations of post-hoc counterfactual explanations for blackbox classifiers, highlighting issues with their assumptions and the importance of properties like proximity, connectedness, and stability.
Contribution
It discusses the key properties necessary for reliable counterfactual explanations and illustrates potential risks when these properties are not satisfied.
Findings
Counterfactual explanations often fail to meet desirable properties.
There are risks associated with the assumptions made by post-hoc approaches.
Ensuring proximity, connectedness, and stability is crucial for reliable explanations.
Abstract
Counterfactual post-hoc interpretability approaches have been proven to be useful tools to generate explanations for the predictions of a trained blackbox classifier. However, the assumptions they make about the data and the classifier make them unreliable in many contexts. In this paper, we discuss three desirable properties and approaches to quantify them: proximity, connectedness and stability. In addition, we illustrate that there is a risk for post-hoc counterfactual approaches to not satisfy these properties.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Anomaly Detection Techniques and Applications · Adversarial Robustness in Machine Learning
MethodsInterpretability
