TL;DR
This paper introduces an interval abstraction method that provides provable robustness guarantees for counterfactual explanations in machine learning models, ensuring their validity under a wide range of model changes.
Contribution
The paper proposes a novel interval abstraction technique for parametric models, formalizes a new robustness notion called Δ-robustness, and develops algorithms to verify and generate robust counterfactual explanations.
Findings
The approach offers provable robustness guarantees for CEs under extensive model variations.
Empirical results demonstrate the effectiveness of the method on neural networks and logistic regression.
Benchmarking shows the superiority of the proposed algorithms in generating robust CEs.
Abstract
Counterfactual Explanations (CEs) have emerged as a major paradigm in explainable AI research, providing recourse recommendations for users affected by the decisions of machine learning models. However, CEs found by existing methods often become invalid when slight changes occur in the parameters of the model they were generated for. The literature lacks a way to provide exhaustive robustness guarantees for CEs under model changes, in that existing methods to improve CEs' robustness are mostly heuristic, and the robustness performances are evaluated empirically using only a limited number of retrained models. To bridge this gap, we propose a novel interval abstraction technique for parametric machine learning models, which allows us to obtain provable robustness guarantees for CEs under a possibly infinite set of plausible model changes . Based on this idea, we formalise a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLogistic Regression · Sparse Evolutionary Training
