Counterfactual Training: Teaching Models Plausible and Actionable Explanations
Patrick Altmeyer, Aleksander Buszydlik, Arie van Deursen, Cynthia C. S. Liem

TL;DR
This paper introduces counterfactual training, a novel approach that incorporates counterfactual explanations during model training to enhance interpretability and robustness, addressing the plausibility and actionability of explanations in real-world applications.
Contribution
It presents a new training regime that directly uses counterfactual explanations to improve model interpretability and robustness, unlike traditional post-hoc explanation methods.
Findings
Models trained with counterfactual training produce more plausible explanations.
Counterfactual training improves models' adversarial robustness.
Theoretical analysis supports the effectiveness of the proposed method.
Abstract
We propose a novel training regime termed counterfactual training that leverages counterfactual explanations to increase the explanatory capacity of models. Counterfactual explanations have emerged as a popular post-hoc explanation method for opaque machine learning models: they inform how factual inputs would need to change in order for a model to produce some desired output. To be useful in real-world decision-making systems, counterfactuals should be plausible with respect to the underlying data and actionable with respect to the feature mutability constraints. Much existing research has therefore focused on developing post-hoc methods to generate counterfactuals that meet these desiderata. In this work, we instead hold models directly accountable for the desired end goal: counterfactual training employs counterfactuals during the training phase to minimize the divergence between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Artificial Intelligence in Healthcare and Education
