An unexpected unity among methods for interpreting model predictions
Scott Lundberg, Su-In Lee

TL;DR
This paper reveals a unifying additive representation for interpreting complex model predictions, demonstrating common principles across various methods and enabling new visual explanations.
Contribution
It introduces a model-agnostic additive importance representation that unifies existing interpretation methods and provides a basis for novel visual explanations.
Findings
Unified interpretation framework for prediction importance
Optimal additive importance representation satisfying key properties
New visual explanation techniques based on the unified representation
Abstract
Understanding why a model made a certain prediction is crucial in many data science fields. Interpretable predictions engender appropriate trust and provide insight into how the model may be improved. However, with large modern datasets the best accuracy is often achieved by complex models even experts struggle to interpret, which creates a tension between accuracy and interpretability. Recently, several methods have been proposed for interpreting predictions from complex models by estimating the importance of input features. Here, we present how a model-agnostic additive representation of the importance of input features unifies current methods. This representation is optimal, in the sense that it is the only set of additive values that satisfies important properties. We show how we can leverage these properties to create novel visual explanations of model predictions. The thread of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification
