TL;DR
This paper critically examines the concept of interpretability in machine learning, highlighting its diverse motivations, conflicting notions, and questioning common assumptions about model transparency and complexity.
Contribution
It clarifies the motivations and conflicting notions of interpretability, and challenges prevailing beliefs about linear and neural network interpretability.
Findings
Motivations for interpretability are diverse and sometimes conflicting.
Transparency and post-hoc explanations are competing notions of interpretability.
Linear models are not necessarily more interpretable than neural networks.
Abstract
Supervised machine learning models boast remarkable predictive capabilities. But can you trust your model? Will it work in deployment? What else can it tell you about the world? We want models to be not only good, but interpretable. And yet the task of interpretation appears underspecified. Papers provide diverse and sometimes non-overlapping motivations for interpretability, and offer myriad notions of what attributes render models interpretable. Despite this ambiguity, many papers proclaim interpretability axiomatically, absent further explanation. In this paper, we seek to refine the discourse on interpretability. First, we examine the motivations underlying interest in interpretability, finding them to be diverse and occasionally discordant. Then, we address model properties and techniques thought to confer interpretability, identifying transparency to humans and post-hoc…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsInterpretability
