Towards Unifying Feature Attribution and Counterfactual Explanations:   Different Means to the Same End

Ramaravind Kommiya Mothilal; Divyat Mahajan; Chenhao Tan and; Amit Sharma

arXiv:2011.04917·cs.LG·June 1, 2021

Towards Unifying Feature Attribution and Counterfactual Explanations: Different Means to the Same End

Ramaravind Kommiya Mothilal, Divyat Mahajan, Chenhao Tan and, Amit Sharma

PDF

1 Repo

TL;DR

This paper unifies feature attribution and counterfactual explanations using causality, showing how they complement each other and evaluating their effectiveness on benchmark datasets and a real-world case study.

Contribution

It introduces a method to derive feature attributions from counterfactuals and uses counterfactuals to assess attribution explanations, highlighting their mutual benefits.

Findings

01

Feature attributions can be generated from counterfactuals.

02

Counterfactuals can evaluate the necessity and sufficiency of attributions.

03

Methods often disagree on feature importance rankings.

Abstract

Feature attributions and counterfactual explanations are popular approaches to explain a ML model. The former assigns an importance score to each input feature, while the latter provides input examples with minimal changes to alter the model's predictions. To unify these approaches, we provide an interpretation based on the actual causality framework and present two key results in terms of their use. First, we present a method to generate feature attribution explanations from a set of counterfactual examples. These feature attributions convey how important a feature is to changing the classification outcome of a model, especially on whether a subset of features is necessary and/or sufficient for that change, which attribution-based methods are unable to provide. Second, we show how counterfactual examples can be used to evaluate the goodness of an attribution-based explanation in terms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

interpretml/DiCE
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsShapley Additive Explanations · Local Interpretable Model-Agnostic Explanations · Causal inference