On the Accuracy of Influence Functions for Measuring Group Effects
Pang Wei Koh, Kai-Siang Ang, Hubert H. K. Teo, Percy Liang

TL;DR
This paper investigates the accuracy of influence functions in estimating the effects of large groups of training points on models, revealing they often correlate well with actual effects in real-world datasets despite potential errors.
Contribution
The study extends influence function analysis from individual points to large groups, providing empirical evidence and theoretical insights into their accuracy in real-world scenarios.
Findings
Influence functions often correlate well with actual group effects in practice.
Theoretical analysis explains conditions under which influence functions are accurate.
Real-world datasets possess properties that support the effectiveness of influence approximations.
Abstract
Influence functions estimate the effect of removing a training point on a model without the need to retrain. They are based on a first-order Taylor approximation that is guaranteed to be accurate for sufficiently small changes to the model, and so are commonly used to study the effect of individual points in large datasets. However, we often want to study the effects of large groups of training points, e.g., to diagnose batch effects or apportion credit between different data sources. Removing such large groups can result in significant changes to the model. Are influence functions still accurate in this setting? In this paper, we find that across many different types of groups and for a range of real-world datasets, the predicted effect (using influence functions) of a group correlates surprisingly well with its actual effect, even if the absolute and relative errors are large. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Modeling Techniques · Computational and Text Analysis Methods · Psychometric Methodologies and Testing
