Provably Robust Bayesian Counterfactual Explanations under Model Changes

Jamie Duell; Xiuyi Fan

arXiv:2601.16659·cs.LG·February 12, 2026

Provably Robust Bayesian Counterfactual Explanations under Model Changes

Jamie Duell, Xiuyi Fan

PDF

Open Access

TL;DR

This paper introduces Probabilistically Safe Counterfactual Explanations (PSCE), a Bayesian method ensuring high-confidence and robust counterfactual explanations that remain valid under model updates, with formal guarantees and empirical validation.

Contribution

The paper proposes a novel Bayesian framework for generating counterfactual explanations that are provably robust and safe under model changes, addressing a key limitation of existing methods.

Findings

01

PSCE provides formal probabilistic guarantees for counterfactual explanations.

02

Empirical results show PSCE produces more plausible and discriminative explanations.

03

PSCE outperforms state-of-the-art Bayesian CE methods in robustness and validity.

Abstract

Counterfactual explanations (CEs) offer interpretable insights into machine learning predictions by answering ``what if?" questions. However, in real-world settings where models are frequently updated, existing counterfactual explanations can quickly become invalid or unreliable. In this paper, we introduce Probabilistically Safe CEs (PSCE), a method for generating counterfactual explanations that are $δ$ -safe, to ensure high predictive confidence, and $ϵ$ -robust to ensure low predictive variance. Based on Bayesian principles, PSCE provides formal probabilistic guarantees for CEs under model changes which are adhered to in what we refer to as the $⟨ δ, ϵ ⟩$ -set. Uncertainty-aware constraints are integrated into our optimization framework and we validate our method empirically across diverse datasets. We compare our approach against state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis