TL;DR
This paper introduces a decision-theoretic method for model interpretability in Bayesian frameworks by optimizing a utility function that balances explanation fidelity and interpretability, demonstrated with real-world data.
Contribution
It proposes a model-agnostic approach that finds interpretable proxy models mimicking complex Bayesian models using a utility function, improving accuracy and stability.
Findings
More accurate models at the same interpretability level compared to prior methods.
The approach is model-agnostic and can be applied with standard optimization tools.
Generated models are more stable across different interpretability methods.
Abstract
A salient approach to interpretable machine learning is to restrict modeling to simple models. In the Bayesian framework, this can be pursued by restricting the model structure and prior to favor interpretable models. Fundamentally, however, interpretability is about users' preferences, not the data generation mechanism; it is more natural to formulate interpretability as a utility function. In this work, we propose an interpretability utility, which explicates the trade-off between explanation fidelity and interpretability in the Bayesian framework. The method consists of two steps. First, a reference model, possibly a black-box Bayesian predictive model which does not compromise accuracy, is fitted to the training data. Second, a proxy model from an interpretable model family that best mimics the predictive behaviour of the reference model is found by optimizing the interpretability…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsInterpretability
