Programs as Black-Box Explanations
Sameer Singh, Marco Tulio Ribeiro, Carlos Guestrin

TL;DR
This paper introduces using small, interpretable programs as model-agnostic explanations for black-box classifiers, offering a flexible and intuitive alternative to traditional explanation methods.
Contribution
It proposes a novel program induction approach based on simulated annealing to generate local explanations, generalizing multiple interpretable explanation families.
Findings
Generated explanations are intuitive and accurate.
Method works on small datasets with various classifiers.
Prototypes demonstrate the approach's flexibility.
Abstract
Recent work in model-agnostic explanations of black-box machine learning has demonstrated that interpretability of complex models does not have to come at the cost of accuracy or model flexibility. However, it is not clear what kind of explanations, such as linear models, decision trees, and rule lists, are the appropriate family to consider, and different tasks and models may benefit from different kinds of explanations. Instead of picking a single family of representations, in this work we propose to use "programs" as model-agnostic explanations. We show that small programs can be expressive yet intuitive as explanations, and generalize over a number of existing interpretable families. We propose a prototype program induction method based on simulated annealing that approximates the local behavior of black-box classifiers around a specific prediction using random perturbations.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Machine Learning and Data Classification · Adversarial Robustness in Machine Learning
MethodsInterpretability
