Declarative Approaches to Counterfactual Explanations for Classification
Leopoldo Bertossi

TL;DR
This paper introduces declarative answer-set programming methods for generating counterfactual explanations in classification tasks, capable of handling black-box and rule-based models, and incorporating domain knowledge.
Contribution
It presents a novel logic programming framework for computing counterfactuals and responsibility scores, extending to probabilistic and domain-aware scenarios.
Findings
Applicable to black-box and rule-based models
Enables computation of maximum responsibility scores
Incorporates semantic and probabilistic extensions
Abstract
We propose answer-set programs that specify and compute counterfactual interventions on entities that are input on a classification model. In relation to the outcome of the model, the resulting counterfactual entities serve as a basis for the definition and computation of causality-based explanation scores for the feature values in the entity under classification, namely "responsibility scores". The approach and the programs can be applied with black-box models, and also with models that can be specified as logic programs, such as rule-based classifiers. The main focus of this work is on the specification and computation of "best" counterfactual entities, i.e. those that lead to maximum responsibility scores. From them one can read off the explanations as maximum responsibility feature values in the original entity. We also extend the programs to bring into the picture semantic or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLogic, Reasoning, and Knowledge · Bayesian Modeling and Causal Inference · Semantic Web and Ontologies
