Counterfactual Evaluation for Explainable AI
Yingqiang Ge, Shuchang Liu, Zelong Li, Shuyuan Xu, Shijie Geng, Yunqi, Li, Juntao Tan, Fei Sun, Yongfeng Zhang

TL;DR
This paper introduces a counterfactual-based methodology to evaluate the faithfulness of explanations in machine learning models, addressing biases in existing erasure-based criteria and providing more accurate assessment tools.
Contribution
It proposes a novel counterfactual reasoning approach and algorithms for evaluating explanation faithfulness, improving over traditional erasure-based methods.
Findings
Counterfactual evaluation correlates better with ground truth than erasure-based methods.
The proposed algorithms work effectively in both discrete and continuous data scenarios.
Empirical results demonstrate improved accuracy in faithfulness measurement.
Abstract
While recent years have witnessed the emergence of various explainable methods in machine learning, to what degree the explanations really represent the reasoning process behind the model prediction -- namely, the faithfulness of explanation -- is still an open problem. One commonly used way to measure faithfulness is \textit{erasure-based} criteria. Though conceptually simple, erasure-based criterion could inevitably introduce biases and artifacts. We propose a new methodology to evaluate the faithfulness of explanations from the \textit{counterfactual reasoning} perspective: the model should produce substantially different outputs for the original input and its corresponding counterfactual edited on a faithful feature. Specially, we introduce two algorithms to find the proper counterfactuals in both discrete and continuous scenarios and then use the acquired counterfactuals to measure…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Machine Learning in Healthcare
MethodsCounterfactuals Explanations
