Do Influence Functions Work on Large Language Models?
Zhe Li, Wei Zhao, Yige Li, Jun Sun

TL;DR
This paper systematically evaluates influence functions on large language models and finds they perform poorly due to approximation errors, convergence issues, and fundamental definition problems, indicating the need for alternative methods.
Contribution
It provides the first comprehensive assessment of influence functions on LLMs, revealing their limitations and highlighting the necessity for new approaches.
Findings
Influence functions perform poorly on LLMs across multiple tasks.
Approximation errors and convergence issues hinder influence function effectiveness.
Changes in model parameters do not reliably indicate changes in LLM behavior.
Abstract
Influence functions are important for quantifying the impact of individual training data points on a model's predictions. Although extensive research has been conducted on influence functions in traditional machine learning models, their application to large language models (LLMs) has been limited. In this work, we conduct a systematic study to address a key question: do influence functions work on LLMs? Specifically, we evaluate influence functions across multiple tasks and find that they consistently perform poorly in most settings. Our further investigation reveals that their poor performance can be attributed to: (1) inevitable approximation errors when estimating the iHVP component due to the scale of LLMs, (2) uncertain convergence during fine-tuning, and, more fundamentally, (3) the definition itself, as changes in model parameters do not necessarily correlate with changes in LLM…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
* The topics studied in this work are important and interesting. Identifying influential examples in model training is an important problem, and investigating and questioning influence function based methods from prior works may be of interest to many researchers. * Comprehensive experiments covering various methods, applications, and to support the analysis of potential reasons behind the poor performance. * In particular, I agree with the third reason attributing to the poor performance. Influ
* I have concerns with the paper presentation. I find this paper confusing at various points. * Background information and experiment settings are not explained clearly. * Tables and figures are not self-contained or explained clearly. The connection between the tables/figures and the main text is weak and may be strengthened. * See my questions below. (Please be aware that influence function is not my direct research area and I am only familiar with the basic concepts; I gave a confidenc
- Provides comprehensive empirical investigation of influence functions in LLMs - Systematically analyzes and identifies key failure modes - Offers both practical and theoretical insights into limitations
- Limited Motivation: - - The complexity and computational aspects of iHVP approximation are not fully discussed - - Benefits compared to existing data filtering methods (such as those in Dolma paper: https://arxiv.org/pdf/2402.00159) are not well justified - Methodological Clarity: - - AdvBench task specifications could benefit from example illustrations - - RepSim's methodology and its strong performance lack detailed explanation - - The rationale for using different datasets for performance
The authors present a comprehensive set of experiments across diverse tasks to clearly demonstrate the ineffectiveness of influence functions when applied to LLMs. In addition to the empirical evidence, the paper offers hypotheses to explain the underlying causes of this ineffectiveness. Overall, the manuscript is well-structured and easy to follow, with the exploration of influence functions in the context of LLMs being both timely and compelling.
Major Concerns: 1. Lack of Critical Details: The paper is missing essential details, particularly in the experimental setup described in Section 3. The authors only mention the dataset and model used but do not explain how they compute the value of the influence function. This omission raises several questions: - Gradient Calculation: How did the authors compute the gradients? Specifically, did they treat all parameters across different layers as a single $\theta$, or did they calculate grad
- The paper is well-written and easy to follow. - The plots and tables provide relevant and interesting data that support the conclusions. - The analysis is thorough, with several interesting results. - The code is included in the supplementary material, and after reviewing it, I am confident that the results can be reproduced, and the implementation is accurate.
- The paper presents only negative results without offering alternative approaches. For example, it would be beneficial to discuss alternative definitions that do not rely on gradients. While RepSim does not fit the traditional definition of gradient-based influence functions, it does show promising results and could be considered as an alternative. Thus, on one hand, the authors are calling for alternative methods to quantify influence, and on the other they are showing that RepSim, which is su
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
