HyperINF: Unleashing the HyperPower of the Schulz's Method for Data Influence Estimation
Xinyu Zhou, Simin Fan, Martin Jaggi

TL;DR
HyperINF introduces a novel influence function approximation method using the hyperpower technique, achieving high accuracy and efficiency on large models by leveraging low-rank Hessian approximations and Schulz's iterative algorithm.
Contribution
This paper presents HyperINF, a new influence estimation method that combines hyperpower iteration with low-rank Hessian approximation for scalable and accurate influence analysis.
Findings
HyperINF outperforms existing baselines in accuracy and stability.
HyperINF achieves minimal memory and computational overhead on large models.
HyperINF improves downstream tasks like data attribution and model fine-tuning.
Abstract
Influence functions provide a principled method to assess the contribution of individual training samples to a specific target. Yet, their high computational costs limit their applications on large-scale models and datasets. Existing methods proposed for influence function approximation have significantly reduced the computational overheads. However, they mostly suffer from inaccurate estimation due to the lack of strong convergence guarantees from the algorithm. The family of hyperpower methods are well-known for their rigorous convergence guarantees on matrix inverse approximation, while the matrix multiplication operation can involve intractable memory and computation costs on large-scale models. We propose HyperINF, an efficient and accurate influence function approximation method which leverages the hyperpower method, specifically Schulz's iterative algorithm. To deal with the…
Peer Reviews
Decision·Submitted to ICLR 2025
- The problem that this paper addresses is a challenging one, and one of increasing importance/popularity in the community. - The writing is pretty clear, and the experiments are well described. - The proposed method is sound, and can potentially see adoption in the community/real world.
- The main contributions of this paper are not clearly disentangled from the overall story. In particular, my understanding is that the primary contribution of this paper is identifying that the Schulz method from the matrix inverse can be efficiently applied in this setting. The rest of the pipeline (Hessian inverse based attribution, Fisher Information Matrix etc) is borrowed from existing work in the field. - I'm hesitant to use the term marginal/ limited novelty as the authors have made an i
+ The computational time and memory usage for data attribution are greatly reduced by HYPERINF. + the effectiveness and efficiency are demonstrated on several tasks, showing the generality of HYPERINF.
- The novelty may be limited, in the sense that an existing numerical power method is applied to an existing problem (The identification of the challenge and the solution are still recognized). - There are some experimental observations that are not explained well. See the questions
Trending topic, the line of data attribution is very important, especially in the era of foundation models.
I am mainly concerned about the assumption made in Lemma 1. I seem to not find any justification for the assumption of zero expectation and independence for gradient columns where the randomness is taken over the label $y ~ p(y|}x, \theta)$. However, this seems to be the key result for the paper (I don't think the application of using Schulz's method for matrix inverse is very impressive). I also took a look at the proof for Lemma 1 and I find it poorly written, which involves typos like 'Var(g(
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications
