Do Not Trust Additive Explanations

Alicja Gosiewska; Przemyslaw Biecek

arXiv:1903.11420·cs.LG·May 11, 2020·36 cites

Do Not Trust Additive Explanations

Alicja Gosiewska, Przemyslaw Biecek

PDF

Open Access 2 Repos

TL;DR

This paper critically examines the faithfulness of additive explanations like LIME and SHAP in complex models, introduces a new interaction detection method, and benchmarks their reliability in the presence of feature interactions.

Contribution

It introduces a novel method to detect interactions in instance-level explanations and evaluates the reliability of additive explanations in non-additive models.

Findings

01

Additive explanations can be misleading in models with feature interactions

02

The new interaction detection method effectively identifies when explanations are unreliable

03

Benchmark results show frequent discrepancies between explanations and true model behavior

Abstract

Explainable Artificial Intelligence (XAI)has received a great deal of attention recently. Explainability is being presented as a remedy for the distrust of complex and opaque models. Model agnostic methods such as LIME, SHAP, or Break Down promise instance-level interpretability for any complex machine learning model. But how faithful are these additive explanations? Can we rely on additive explanations for non-additive models? In this paper, we (1) examine the behavior of the most popular instance-level explanations under the presence of interactions, (2) introduce a new method that detects interactions for instance-level explanations, (3) perform a large scale benchmark to see how frequently additive explanations may be misleading.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Machine Learning and Data Classification · Adversarial Robustness in Machine Learning

MethodsInterpretability · Shapley Additive Explanations · Local Interpretable Model-Agnostic Explanations