Synthesizing Pareto-Optimal Interpretations for Black-Box Models

Hazem Torfah; Shetal Shah; Supratik Chakraborty; S. Akshay; Sanjit A.; Seshia

arXiv:2108.07307·cs.LG·August 18, 2021

Synthesizing Pareto-Optimal Interpretations for Black-Box Models

Hazem Torfah, Shetal Shah, Supratik Chakraborty, S. Akshay, Sanjit A., Seshia

PDF

TL;DR

This paper introduces a multi-objective optimization framework to generate a diverse set of Pareto-optimal interpretations for black-box models, balancing correctness and explainability, and demonstrating its effectiveness on neural network classifiers.

Contribution

It presents a general multi-objective synthesis method for interpretations, allowing flexible trade-offs and using constraint solving, which surpasses single-objective approaches.

Findings

01

Multiple Pareto-optimal interpretations exist for black-box models.

02

The approach can synthesize diverse interpretations missed by existing methods.

03

Application to neural networks shows richer interpretation options.

Abstract

We present a new multi-objective optimization approach for synthesizing interpretations that "explain" the behavior of black-box machine learning models. Constructing human-understandable interpretations for black-box models often requires balancing conflicting objectives. A simple interpretation may be easier to understand for humans while being less precise in its predictions vis-a-vis a complex interpretation. Existing methods for synthesizing interpretations use a single objective function and are often optimized for a single class of interpretations. In contrast, we provide a more general and multi-objective synthesis framework that allows users to choose (1) the class of syntactic templates from which an interpretation should be synthesized, and (2) quantitative measures on both the correctness and explainability of an interpretation. For a given black-box, our approach yields a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.