Explaining Black Box Predictions and Unveiling Data Artifacts through   Influence Functions

Xiaochuang Han; Byron C. Wallace; Yulia Tsvetkov

arXiv:2005.06676·cs.CL·May 15, 2020·19 cites

Explaining Black Box Predictions and Unveiling Data Artifacts through Influence Functions

Xiaochuang Han, Byron C. Wallace, Yulia Tsvetkov

PDF

Open Access 1 Repo

TL;DR

This paper explores the use of influence functions to interpret NLP models by identifying influential training examples, offering insights into model decisions and revealing data artifacts, especially in tasks like natural language inference.

Contribution

It introduces influence functions as an alternative to saliency maps for NLP interpretation and develops a new quantitative measure to detect data artifacts.

Findings

01

Influence functions outperform saliency maps in natural language inference tasks.

02

They effectively identify influential training examples for model decisions.

03

The method reveals artifacts in training data that affect model behavior.

Abstract

Modern deep learning models for NLP are notoriously opaque. This has motivated the development of methods for interpreting such models, e.g., via gradient-based saliency maps or the visualization of attention weights. Such approaches aim to provide explanations for a particular model prediction by highlighting important words in the corresponding input text. While this might be useful for tasks where decisions are explicitly influenced by individual tokens in the input, we suspect that such highlighting is not suitable for tasks where model decisions should be driven by more complex reasoning. In this work, we investigate the use of influence functions for NLP, providing an alternative approach to interpreting neural text classifiers. Influence functions explain the decisions of a model by identifying influential training examples. Despite the promise of this approach, influence…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xhan77/influence-function-analysis
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Adversarial Robustness in Machine Learning