Contrastive Explanations for Model Interpretability

Alon Jacovi; Swabha Swayamdipta; Shauli Ravfogel; Yanai Elazar; Yejin; Choi; Yoav Goldberg

arXiv:2103.01378·cs.CL·September 15, 2021

Contrastive Explanations for Model Interpretability

Alon Jacovi, Swabha Swayamdipta, Shauli Ravfogel, Yanai Elazar, Yejin, Choi, Yoav Goldberg

PDF

1 Repo

TL;DR

This paper introduces a novel contrastive explanation methodology for classification models that enhances interpretability by focusing on features that differentiate specific decision pairs, demonstrated on text classification tasks.

Contribution

The paper presents a new approach to generate contrastive explanations by modifying model representations and behavior to highlight decision-specific features.

Findings

01

Contrastive explanations improve interpretability of model decisions.

02

Method effectively distinguishes features relevant to specific labels.

03

Approach applicable to both high-level and low-level input attributions.

Abstract

Contrastive explanations clarify why an event occurred in contrast to another. They are more inherently intuitive to humans to both produce and comprehend. We propose a methodology to produce contrastive explanations for classification models by modifying the representation to disregard non-contrastive information, and modifying model behavior to only be based on contrastive reasoning. Our method is based on projecting model representation to a latent space that captures only the features that are useful (to the model) to differentiate two potential decisions. We demonstrate the value of contrastive explanations by analyzing two different scenarios, using both high-level abstract concept attribution and low-level input token/span attribution, on two widely used text classification tasks. Specifically, we produce explanations for answering: for which label, and against which alternative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

allenai/contrastive-explanations
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.