Shared Interest: Measuring Human-AI Alignment to Identify Recurring   Patterns in Model Behavior

Angie Boggust; Benjamin Hoover; Arvind Satyanarayan; Hendrik Strobelt

arXiv:2107.09234·cs.LG·March 28, 2022

Shared Interest: Measuring Human-AI Alignment to Identify Recurring Patterns in Model Behavior

Angie Boggust, Benjamin Hoover, Arvind Satyanarayan, Hendrik Strobelt

PDF

Open Access 1 Repo

TL;DR

Shared Interest introduces quantitative metrics to compare neural network saliency with human reasoning, enabling large-scale analysis of model behavior and identification of recurring patterns to assess trustworthiness.

Contribution

The paper presents Shared Interest, a novel set of metrics for systematically comparing model saliency with human annotations, facilitating large-scale analysis of model reasoning patterns.

Findings

01

Identified eight recurring model behavior patterns.

02

Demonstrated how Shared Interest can assess model trustworthiness.

03

Showed that Shared Interest uncovers issues missed by manual analysis.

Abstract

Saliency methods -- techniques to identify the importance of input features on a model's output -- are a common step in understanding neural network behavior. However, interpreting saliency requires tedious manual inspection to identify and aggregate patterns in model behavior, resulting in ad hoc or cherry-picked analysis. To address these concerns, we present Shared Interest: metrics for comparing model reasoning (via saliency) to human reasoning (via ground truth annotations). By providing quantitative descriptors, Shared Interest enables ranking, sorting, and aggregating inputs, thereby facilitating large-scale systematic analysis of model behavior. We use Shared Interest to identify eight recurring patterns in model behavior, such as cases where contextual features or a subset of ground truth features are most important to the model. Working with representative real-world users, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mitvis/shared-interest
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Data Visualization and Analytics · Machine Learning and Data Classification

MethodsHigh-Order Consensuses