Disagreement amongst counterfactual explanations: How transparency can be deceptive
Dieter Brughmans, Lissa Melis, David Martens

TL;DR
This paper empirically investigates the disagreement among counterfactual explanations in XAI, revealing high variability that can be exploited maliciously, and highlights the need for transparency about explanation diversity.
Contribution
It provides the first large-scale empirical assessment of disagreement in counterfactual explanations across multiple datasets and methods.
Findings
High disagreement levels among explanation methods
Malicious agents can manipulate explanations to hide or include features
Dataset characteristics and algorithm type influence disagreement
Abstract
Counterfactual explanations are increasingly used as an Explainable Artificial Intelligence (XAI) technique to provide stakeholders of complex machine learning algorithms with explanations for data-driven decisions. The popularity of counterfactual explanations resulted in a boom in the algorithms generating them. However, not every algorithm creates uniform explanations for the same instance. Even though in some contexts multiple possible explanations are beneficial, there are circumstances where diversity amongst counterfactual explanations results in a potential disagreement problem among stakeholders. Ethical issues arise when for example, malicious agents use this diversity to fairwash an unfair machine learning model by hiding sensitive features. As legislators worldwide tend to start including the right to explanations for data-driven, high-stakes decisions in their policies,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Privacy-Preserving Technologies in Data
