Studying Large Language Model Generalization with Influence Functions
Roger Grosse, Juhan Bae, Cem Anil, Nelson Elhage, Alex Tamkin,, Amirhossein Tajdini, Benoit Steiner, Dustin Li, Esin Durmus, Ethan Perez,, Evan Hubinger, Kamil\.e Luko\v{s}i\=ut\.e, Karina Nguyen, Nicholas Joseph,, Sam McCandlish, Jared Kaplan, Samuel R. Bowman

TL;DR
This paper scales influence functions to large language models using EK-FAC, enabling analysis of model generalization patterns and limitations in understanding training data contributions.
Contribution
It introduces EK-FAC for scalable influence function computation on LLMs and applies it to analyze their generalization behaviors and limitations.
Findings
EK-FAC achieves similar accuracy to traditional methods.
Influence patterns become sparser with scale.
Flipping phrase order diminishes influence, revealing limitations.
Abstract
When trying to gain better visibility into a machine learning model in order to understand and mitigate the associated risks, a potentially valuable source of evidence is: which training examples most contribute to a given behavior? Influence functions aim to answer a counterfactual: how would the model's parameters (and hence its outputs) change if a given sequence were added to the training set? While influence functions have produced insights for small models, they are difficult to scale to large language models (LLMs) due to the difficulty of computing an inverse-Hessian-vector product (IHVP). We use the Eigenvalue-corrected Kronecker-Factored Approximate Curvature (EK-FAC) approximation to scale influence functions up to LLMs with up to 52 billion parameters. In our experiments, EK-FAC achieves similar accuracy to traditional influence function estimators despite the IHVP…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
