Understanding Influence Functions and Datamodels via Harmonic Analysis

Nikunj Saunshi; Arushi Gupta; Mark Braverman; Sanjeev Arora

arXiv:2210.01072·cs.LG·October 4, 2022·1 cites

Understanding Influence Functions and Datamodels via Harmonic Analysis

Nikunj Saunshi, Arushi Gupta, Mark Braverman, Sanjeev Arora

PDF

Open Access 1 Video

TL;DR

This paper uses harmonic analysis to theoretically understand influence functions and datamodels in deep learning, providing exact characterizations and efficient estimation methods for their effects.

Contribution

It offers a Fourier-based characterization of datamodels, an efficient residual error estimation method, and insights into group influence linearity in data effects.

Findings

01

Fourier coefficients precisely characterize datamodels.

02

Efficient method to estimate residual errors without training.

03

Insights into when group influences sum linearly.

Abstract

Influence functions estimate effect of individual data points on predictions of the model on test data and were adapted to deep learning in Koh and Liang [2017]. They have been used for detecting data poisoning, detecting helpful and harmful examples, influence of groups of datapoints, etc. Recently, Ilyas et al. [2022] introduced a linear regression method they termed datamodels to predict the effect of training points on outputs on test data. The current paper seeks to provide a better theoretical understanding of such interesting empirical phenomena. The primary tool is harmonic analysis and the idea of noise stability. Contributions include: (a) Exact characterization of the learnt datamodel in terms of Fourier coefficients. (b) An efficient method to estimate the residual error and quality of the optimum linear datamodel without having to train the datamodel. (c) New insights into…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Understanding Influence Functions and Datamodels via Harmonic Analysis· slideslive

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Nuclear Engineering Thermal-Hydraulics

MethodsTest · Linear Regression