Statistical and Computational Guarantees for Influence Diagnostics

Jillian Fisher; Lang Liu; Krishna Pillutla; Yejin Choi; Zaid Harchaoui

arXiv:2212.04014·stat.ML·September 21, 2023

Statistical and Computational Guarantees for Influence Diagnostics

Jillian Fisher, Lang Liu, Krishna Pillutla, Yejin Choi, Zaid Harchaoui

PDF

Open Access 1 Repo

TL;DR

This paper provides finite-sample statistical and computational guarantees for influence diagnostics like influence functions and perturbations, demonstrating their effectiveness in models including generalized linear and large attention-based models.

Contribution

It establishes the first finite-sample statistical and computational bounds for influence diagnostics with efficient inverse-Hessian-vector product methods.

Findings

01

Finite-sample statistical bounds for influence diagnostics.

02

Computational complexity bounds for influence functions.

03

Empirical validation on generalized linear and attention-based models.

Abstract

Influence diagnostics such as influence functions and approximate maximum influence perturbations are popular in machine learning and in AI domain applications. Influence diagnostics are powerful statistical tools to identify influential datapoints or subsets of datapoints. We establish finite-sample statistical bounds, as well as computational complexity bounds, for influence functions and approximate maximum influence perturbations using efficient inverse-Hessian-vector product implementations. We illustrate our results with generalized linear models and large attention based models on synthetic and real data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jfisher52/influence_theory
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopological and Geometric Data Analysis · Bayesian Methods and Mixture Models · Markov Chains and Monte Carlo Methods