Synthesizing Pareto-Optimal Interpretations for Black-Box Models
Hazem Torfah, Shetal Shah, Supratik Chakraborty, S. Akshay, Sanjit A., Seshia

TL;DR
This paper introduces a multi-objective optimization framework to generate a diverse set of Pareto-optimal interpretations for black-box models, balancing correctness and explainability, and demonstrating its effectiveness on neural network classifiers.
Contribution
It presents a general multi-objective synthesis method for interpretations, allowing flexible trade-offs and using constraint solving, which surpasses single-objective approaches.
Findings
Multiple Pareto-optimal interpretations exist for black-box models.
The approach can synthesize diverse interpretations missed by existing methods.
Application to neural networks shows richer interpretation options.
Abstract
We present a new multi-objective optimization approach for synthesizing interpretations that "explain" the behavior of black-box machine learning models. Constructing human-understandable interpretations for black-box models often requires balancing conflicting objectives. A simple interpretation may be easier to understand for humans while being less precise in its predictions vis-a-vis a complex interpretation. Existing methods for synthesizing interpretations use a single objective function and are often optimized for a single class of interpretations. In contrast, we provide a more general and multi-objective synthesis framework that allows users to choose (1) the class of syntactic templates from which an interpretation should be synthesized, and (2) quantitative measures on both the correctness and explainability of an interpretation. For a given black-box, our approach yields a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
