Human-in-the-Loop Interpretability Prior

Isaac Lage; Andrew Slavin Ross; Been Kim; Samuel J. Gershman; Finale; Doshi-Velez

arXiv:1805.11571·stat.ML·November 1, 2018·45 cites

Human-in-the-Loop Interpretability Prior

Isaac Lage, Andrew Slavin Ross, Been Kim, Samuel J. Gershman, Finale, Doshi-Velez

PDF

Open Access

TL;DR

This paper introduces a human-in-the-loop approach to optimize machine learning models for interpretability by directly involving human feedback, moving beyond proxy measures like sparsity.

Contribution

It presents an algorithm that reduces the need for extensive user studies to identify models that are both accurate and interpretable, tailored to different datasets.

Findings

01

Different datasets favor different interpretability proxies.

02

The approach effectively balances interpretability and accuracy.

03

Fewer user studies are needed to find suitable models.

Abstract

We often desire our models to be interpretable as well as accurate. Prior work on optimizing models for interpretability has relied on easy-to-quantify proxies for interpretability, such as sparsity or the number of operations required. In this work, we optimize for interpretability by directly including humans in the optimization loop. We develop an algorithm that minimizes the number of user studies to find models that are both predictive and interpretable and demonstrate our approach on several data sets. Our human subjects results show trends towards different proxy notions of interpretability on different datasets, which suggests that different proxies are preferred on different tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFault Detection and Control Systems

MethodsInterpretability