Studying Large Language Model Generalization with Influence Functions

Roger Grosse; Juhan Bae; Cem Anil; Nelson Elhage; Alex Tamkin,; Amirhossein Tajdini; Benoit Steiner; Dustin Li; Esin Durmus; Ethan Perez,; Evan Hubinger; Kamil\.e Luko\v{s}i\=ut\.e; Karina Nguyen; Nicholas Joseph,; Sam McCandlish; Jared Kaplan; Samuel R. Bowman

arXiv:2308.03296·cs.LG·August 8, 2023·25 cites

Studying Large Language Model Generalization with Influence Functions

Roger Grosse, Juhan Bae, Cem Anil, Nelson Elhage, Alex Tamkin,, Amirhossein Tajdini, Benoit Steiner, Dustin Li, Esin Durmus, Ethan Perez,, Evan Hubinger, Kamil\.e Luko\v{s}i\=ut\.e, Karina Nguyen, Nicholas Joseph,, Sam McCandlish, Jared Kaplan, Samuel R. Bowman

PDF

Open Access 3 Repos

TL;DR

This paper scales influence functions to large language models using EK-FAC, enabling analysis of model generalization patterns and limitations in understanding training data contributions.

Contribution

It introduces EK-FAC for scalable influence function computation on LLMs and applies it to analyze their generalization behaviors and limitations.

Findings

01

EK-FAC achieves similar accuracy to traditional methods.

02

Influence patterns become sparser with scale.

03

Flipping phrase order diminishes influence, revealing limitations.

Abstract

When trying to gain better visibility into a machine learning model in order to understand and mitigate the associated risks, a potentially valuable source of evidence is: which training examples most contribute to a given behavior? Influence functions aim to answer a counterfactual: how would the model's parameters (and hence its outputs) change if a given sequence were added to the training set? While influence functions have produced insights for small models, they are difficult to scale to large language models (LLMs) due to the difficulty of computing an inverse-Hessian-vector product (IHVP). We use the Eigenvalue-corrected Kronecker-Factored Approximate Curvature (EK-FAC) approximation to scale influence functions up to LLMs with up to 52 billion parameters. In our experiments, EK-FAC achieves similar accuracy to traditional influence function estimators despite the IHVP…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification