Explaining Black Box Predictions and Unveiling Data Artifacts through Influence Functions
Xiaochuang Han, Byron C. Wallace, Yulia Tsvetkov

TL;DR
This paper explores the use of influence functions to interpret NLP models by identifying influential training examples, offering insights into model decisions and revealing data artifacts, especially in tasks like natural language inference.
Contribution
It introduces influence functions as an alternative to saliency maps for NLP interpretation and develops a new quantitative measure to detect data artifacts.
Findings
Influence functions outperform saliency maps in natural language inference tasks.
They effectively identify influential training examples for model decisions.
The method reveals artifacts in training data that affect model behavior.
Abstract
Modern deep learning models for NLP are notoriously opaque. This has motivated the development of methods for interpreting such models, e.g., via gradient-based saliency maps or the visualization of attention weights. Such approaches aim to provide explanations for a particular model prediction by highlighting important words in the corresponding input text. While this might be useful for tasks where decisions are explicitly influenced by individual tokens in the input, we suspect that such highlighting is not suitable for tasks where model decisions should be driven by more complex reasoning. In this work, we investigate the use of influence functions for NLP, providing an alternative approach to interpreting neural text classifiers. Influence functions explain the decisions of a model by identifying influential training examples. Despite the promise of this approach, influence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Adversarial Robustness in Machine Learning
