Enforcing Interpretability and its Statistical Impacts: Trade-offs between Accuracy and Interpretability
Gintare Karolina Dziugaite, Shai Ben-David, Daniel M. Roy

TL;DR
This paper formally studies the statistical trade-offs involved in enforcing interpretability constraints in machine learning, focusing on accuracy impacts within empirical risk minimization for binary classification.
Contribution
It introduces a novel framework modeling interpretability enforcement as a hypothesis set restriction, enabling analysis of accuracy trade-offs using statistical learning theory.
Findings
Trade-offs depend on the nature of the interpretable hypothesis set
Enforcing interpretability may or may not incur additional statistical risk
The framework facilitates understanding of when interpretability impacts accuracy
Abstract
To date, there has been no formal study of the statistical cost of interpretability in machine learning. As such, the discourse around potential trade-offs is often informal and misconceptions abound. In this work, we aim to initiate a formal study of these trade-offs. A seemingly insurmountable roadblock is the lack of any agreed upon definition of interpretability. Instead, we propose a shift in perspective. Rather than attempt to define interpretability, we propose to model the \emph{act} of \emph{enforcing} interpretability. As a starting point, we focus on the setting of empirical risk minimization for binary classification, and view interpretability as a constraint placed on learning. That is, we assume we are given a subset of hypothesis that are deemed to be interpretable, possibly depending on the data distribution and other aspects of the context. We then model the act of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Machine Learning and Data Classification · Adversarial Robustness in Machine Learning
MethodsInterpretability
