The Price of Interpretability
Dimitris Bertsimas, Arthur Delarue, Patrick Jaillet, Sebastien Martin

TL;DR
This paper introduces a formal framework to quantify the tradeoff between interpretability and predictive accuracy in machine learning models, providing practical algorithms for real-world application.
Contribution
It presents a mathematical framework for constructing interpretable models, generalizes interpretability proxies, and quantifies the interpretability-accuracy tradeoff.
Findings
A formal measure of interpretability is developed.
The framework recovers standard interpretability proxies.
Algorithms are demonstrated on real and synthetic datasets.
Abstract
When quantitative models are used to support decision-making on complex and important topics, understanding a model's ``reasoning'' can increase trust in its predictions, expose hidden biases, or reduce vulnerability to adversarial attacks. However, the concept of interpretability remains loosely defined and application-specific. In this paper, we introduce a mathematical framework in which machine learning models are constructed in a sequence of interpretable steps. We show that for a variety of models, a natural choice of interpretable steps recovers standard interpretability proxies (e.g., sparsity in linear models). We then generalize these proxies to yield a parametrized family of consistent measures of model interpretability. This formal definition allows us to quantify the ``price'' of interpretability, i.e., the tradeoff with predictive accuracy. We demonstrate practical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Ethics and Social Impacts of AI
MethodsInterpretability
