Understanding Black-box Predictions via Influence Functions
Pang Wei Koh, Percy Liang

TL;DR
This paper adapts influence functions from robust statistics to explain black-box model predictions by tracing them back to influential training data, enabling better understanding, debugging, and data error detection.
Contribution
It introduces an efficient method to scale influence functions to modern machine learning models, including non-convex and non-differentiable ones, for interpretability and debugging.
Findings
Influence functions help understand model behavior and debug models.
They can detect dataset errors effectively.
They enable creation of training-set attacks indistinguishable from genuine data.
Abstract
How can we explain the predictions of a black-box model? In this paper, we use influence functions -- a classic technique from robust statistics -- to trace a model's prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. To scale up influence functions to modern machine learning settings, we develop a simple, efficient implementation that requires only oracle access to gradients and Hessian-vector products. We show that even on non-convex and non-differentiable models where the theory breaks down, approximations to influence functions can still provide valuable information. On linear models and convolutional neural networks, we demonstrate that influence functions are useful for multiple purposes: understanding model behavior, debugging models, detecting dataset errors, and even creating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- https://worksheets.codalab.org/worksheets/0x2b314dc3536b482dbba02783a24719fdnoneOfficial
- AnonymizedAuthor663/NNIF_adv_defensetf
- ShinKyuY/Understanding-Black-box-Predictions-via-Influence-Functions-tutorial-MNIST-7-vs-1-Classificationnone
- aai-institute/pyDVLpytorch
- nimarb/pytorch_influence_functionspytorch
Videos
The Gap Between Humans and Machines Is ___ [Dr. Max Bartolo]· youtube
Understanding Black-box Predictions via Influence Functions· youtube
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Machine Learning and Data Classification · Anomaly Detection Techniques and Applications
