Provably Robust and Plausible Counterfactual Explanations for Neural Networks via Robust Optimisation
Junqi Jiang, Jianglin Lan, Francesco Leofante, Antonio Rago, Francesca, Toni

TL;DR
This paper introduces PROPLACE, a novel method for generating counterfactual explanations for neural networks that are provably robust, plausible, and optimized for closeness, addressing limitations of existing approaches.
Contribution
PROPLACE is the first method to simultaneously optimize for robustness, plausibility, and closeness in counterfactual explanations with proven convergence and soundness.
Findings
PROPLACE outperforms five baseline methods on robustness, plausibility, and closeness metrics.
The iterative algorithm for PROPLACE converges and guarantees soundness and completeness.
Experimental results demonstrate state-of-the-art performance across multiple evaluation metrics.
Abstract
Counterfactual Explanations (CEs) have received increasing interest as a major methodology for explaining neural network classifiers. Usually, CEs for an input-output pair are defined as data points with minimum distance to the input that are classified with a different label than the output. To tackle the established problem that CEs are easily invalidated when model parameters are updated (e.g. retrained), studies have proposed ways to certify the robustness of CEs under model parameter changes bounded by a norm ball. However, existing methods targeting this form of robustness are not sound or complete, and they may generate implausible CEs, i.e., outliers wrt the training dataset. In fact, no existing method simultaneously optimises for closeness and plausibility while preserving robustness guarantees. In this work, we propose Provably RObust and PLAusible Counterfactual Explanations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Machine Learning in Materials Science
