The Mythos of Model Interpretability

Zachary C. Lipton

arXiv:1606.03490·cs.LG·March 7, 2017

The Mythos of Model Interpretability

Zachary C. Lipton

PDF

2 Repos

TL;DR

This paper critically examines the concept of interpretability in machine learning, highlighting its diverse motivations, conflicting notions, and questioning common assumptions about model transparency and complexity.

Contribution

It clarifies the motivations and conflicting notions of interpretability, and challenges prevailing beliefs about linear and neural network interpretability.

Findings

01

Motivations for interpretability are diverse and sometimes conflicting.

02

Transparency and post-hoc explanations are competing notions of interpretability.

03

Linear models are not necessarily more interpretable than neural networks.

Abstract

Supervised machine learning models boast remarkable predictive capabilities. But can you trust your model? Will it work in deployment? What else can it tell you about the world? We want models to be not only good, but interpretable. And yet the task of interpretation appears underspecified. Papers provide diverse and sometimes non-overlapping motivations for interpretability, and offer myriad notions of what attributes render models interpretable. Despite this ambiguity, many papers proclaim interpretability axiomatically, absent further explanation. In this paper, we seek to refine the discourse on interpretability. First, we examine the motivations underlying interest in interpretability, finding them to be diverse and occasionally discordant. Then, we address model properties and techniques thought to confer interpretability, identifying transparency to humans and post-hoc…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsInterpretability