TL;DR
This paper unifies feature attribution and counterfactual explanations using causality, showing how they complement each other and evaluating their effectiveness on benchmark datasets and a real-world case study.
Contribution
It introduces a method to derive feature attributions from counterfactuals and uses counterfactuals to assess attribution explanations, highlighting their mutual benefits.
Findings
Feature attributions can be generated from counterfactuals.
Counterfactuals can evaluate the necessity and sufficiency of attributions.
Methods often disagree on feature importance rankings.
Abstract
Feature attributions and counterfactual explanations are popular approaches to explain a ML model. The former assigns an importance score to each input feature, while the latter provides input examples with minimal changes to alter the model's predictions. To unify these approaches, we provide an interpretation based on the actual causality framework and present two key results in terms of their use. First, we present a method to generate feature attribution explanations from a set of counterfactual examples. These feature attributions convey how important a feature is to changing the classification outcome of a model, especially on whether a subset of features is necessary and/or sufficient for that change, which attribution-based methods are unable to provide. Second, we show how counterfactual examples can be used to evaluate the goodness of an attribution-based explanation in terms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsShapley Additive Explanations · Local Interpretable Model-Agnostic Explanations · Causal inference
