Understanding Black-box Predictions via Influence Functions

Pang Wei Koh; Percy Liang

arXiv:1703.04730·stat.ML·January 1, 2021·1.2k cites

Understanding Black-box Predictions via Influence Functions

Pang Wei Koh, Percy Liang

PDF

Open Access 5 Repos 2 Videos

TL;DR

This paper adapts influence functions from robust statistics to explain black-box model predictions by tracing them back to influential training data, enabling better understanding, debugging, and data error detection.

Contribution

It introduces an efficient method to scale influence functions to modern machine learning models, including non-convex and non-differentiable ones, for interpretability and debugging.

Findings

01

Influence functions help understand model behavior and debug models.

02

They can detect dataset errors effectively.

03

They enable creation of training-set attacks indistinguishable from genuine data.

Abstract

How can we explain the predictions of a black-box model? In this paper, we use influence functions -- a classic technique from robust statistics -- to trace a model's prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. To scale up influence functions to modern machine learning settings, we develop a simple, efficient implementation that requires only oracle access to gradients and Hessian-vector products. We show that even on non-convex and non-differentiable models where the theory breaks down, approximations to influence functions can still provide valuable information. On linear models and convolutional neural networks, we demonstrate that influence functions are useful for multiple purposes: understanding model behavior, debugging models, detecting dataset errors, and even creating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

The Gap Between Humans and Machines Is ___ [Dr. Max Bartolo]· youtube

Understanding Black-box Predictions via Influence Functions· youtube

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Machine Learning and Data Classification · Anomaly Detection Techniques and Applications