Provably Robust Bayesian Counterfactual Explanations under Model Changes
Jamie Duell, Xiuyi Fan

TL;DR
This paper introduces Probabilistically Safe Counterfactual Explanations (PSCE), a Bayesian method ensuring high-confidence and robust counterfactual explanations that remain valid under model updates, with formal guarantees and empirical validation.
Contribution
The paper proposes a novel Bayesian framework for generating counterfactual explanations that are provably robust and safe under model changes, addressing a key limitation of existing methods.
Findings
PSCE provides formal probabilistic guarantees for counterfactual explanations.
Empirical results show PSCE produces more plausible and discriminative explanations.
PSCE outperforms state-of-the-art Bayesian CE methods in robustness and validity.
Abstract
Counterfactual explanations (CEs) offer interpretable insights into machine learning predictions by answering ``what if?" questions. However, in real-world settings where models are frequently updated, existing counterfactual explanations can quickly become invalid or unreliable. In this paper, we introduce Probabilistically Safe CEs (PSCE), a method for generating counterfactual explanations that are -safe, to ensure high predictive confidence, and -robust to ensure low predictive variance. Based on Bayesian principles, PSCE provides formal probabilistic guarantees for CEs under model changes which are adhered to in what we refer to as the -set. Uncertainty-aware constraints are integrated into our optimization framework and we validate our method empirically across diverse datasets. We compare our approach against state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis
