On the Accuracy of Newton Step and Influence Function Data Attributions
Ittai Rubinstein, Samuel B. Hopkins

TL;DR
This paper provides a new analysis of influence function and Newton step data attribution methods for convex learning, removing previous assumptions and establishing asymptotic error bounds for logistic regressions.
Contribution
It introduces the first analysis of these attribution methods without relying on strong convexity, explaining their relative accuracy and deriving tight asymptotic error bounds.
Findings
Bounds are asymptotically tight up to poly-logarithmic factors.
Error scaling laws are established for well-behaved logistic regressions.
Newton Step often more accurate than Influence Function in practice.
Abstract
Data attribution aims to explain model predictions by estimating how they would change if certain training points were removed, and is used in a wide range of applications, from interpretability and credit assignment to unlearning and privacy. Even in the relatively simple case of logistic regressions, existing mathematical analyses of leading data attribution methods such as Influence Functions (IF) and single Newton Step (NS) remain limited in two key ways. First, they rely on global strong convexity assumptions which are often not satisfied in practice. Second, the resulting bounds scale very poorly with the number of parameters () and the number of samples removed (). As a result, these analyses are not tight enough to answer fundamental questions such as "what is the asymptotic scaling of the errors of each method?" or "which of these methods is more accurate for a given…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Privacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning
