TL;DR
This paper introduces a novel contrastive explanation methodology for classification models that enhances interpretability by focusing on features that differentiate specific decision pairs, demonstrated on text classification tasks.
Contribution
The paper presents a new approach to generate contrastive explanations by modifying model representations and behavior to highlight decision-specific features.
Findings
Contrastive explanations improve interpretability of model decisions.
Method effectively distinguishes features relevant to specific labels.
Approach applicable to both high-level and low-level input attributions.
Abstract
Contrastive explanations clarify why an event occurred in contrast to another. They are more inherently intuitive to humans to both produce and comprehend. We propose a methodology to produce contrastive explanations for classification models by modifying the representation to disregard non-contrastive information, and modifying model behavior to only be based on contrastive reasoning. Our method is based on projecting model representation to a latent space that captures only the features that are useful (to the model) to differentiate two potential decisions. We demonstrate the value of contrastive explanations by analyzing two different scenarios, using both high-level abstract concept attribution and low-level input token/span attribution, on two widely used text classification tasks. Specifically, we produce explanations for answering: for which label, and against which alternative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
