Understanding Influence Functions and Datamodels via Harmonic Analysis
Nikunj Saunshi, Arushi Gupta, Mark Braverman, Sanjeev Arora

TL;DR
This paper uses harmonic analysis to theoretically understand influence functions and datamodels in deep learning, providing exact characterizations and efficient estimation methods for their effects.
Contribution
It offers a Fourier-based characterization of datamodels, an efficient residual error estimation method, and insights into group influence linearity in data effects.
Findings
Fourier coefficients precisely characterize datamodels.
Efficient method to estimate residual errors without training.
Insights into when group influences sum linearly.
Abstract
Influence functions estimate effect of individual data points on predictions of the model on test data and were adapted to deep learning in Koh and Liang [2017]. They have been used for detecting data poisoning, detecting helpful and harmful examples, influence of groups of datapoints, etc. Recently, Ilyas et al. [2022] introduced a linear regression method they termed datamodels to predict the effect of training points on outputs on test data. The current paper seeks to provide a better theoretical understanding of such interesting empirical phenomena. The primary tool is harmonic analysis and the idea of noise stability. Contributions include: (a) Exact characterization of the learnt datamodel in terms of Fourier coefficients. (b) An efficient method to estimate the residual error and quality of the optimum linear datamodel without having to train the datamodel. (c) New insights into…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Nuclear Engineering Thermal-Hydraulics
MethodsTest · Linear Regression
