If Only We Had Better Counterfactual Explanations: Five Key Deficits to Rectify in the Evaluation of Counterfactual XAI Techniques
Mark T Keane, Eoin M Kenny, Eoin Delaney, Barry Smyth

TL;DR
This paper reviews 100 counterfactual explanation methods in AI, highlighting significant evaluation shortcomings and proposing standardized benchmarks to improve scientific progress in counterfactual XAI.
Contribution
It identifies five key evaluation deficits in current counterfactual explanation methods and offers a roadmap with standardized benchmarks to address these issues.
Findings
Only 21% of methods have been user tested
Major evaluation deficits hinder scientific progress
Proposes standardized benchmarks for better evaluation
Abstract
In recent years, there has been an explosion of AI research on counterfactual explanations as a solution to the problem of eXplainable AI (XAI). These explanations seem to offer technical, psychological and legal benefits over other explanation techniques. We survey 100 distinct counterfactual explanation methods reported in the literature. This survey addresses the extent to which these methods have been adequately evaluated, both psychologically and computationally, and quantifies the shortfalls occurring. For instance, only 21% of these methods have been user tested. Five key deficits in the evaluation of these methods are detailed and a roadmap, with standardised benchmark evaluations, is proposed to resolve the issues arising; issues, that currently effectively block scientific progress in this field.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
