Locating and Editing Factual Associations in GPT
Kevin Meng, David Bau, Alex Andonian, Yonatan Belinkov

TL;DR
This paper investigates how factual knowledge is stored in GPT models, identifies specific neural mechanisms responsible, and introduces a method to directly edit these associations effectively, improving model reliability.
Contribution
It reveals that factual associations are stored in localized, editable computations within middle-layer modules and proposes ROME, a new method for precise model editing.
Findings
ROME effectively updates factual associations in GPT models
Mid-layer feed-forward modules are key to storing factual knowledge
ROME maintains specificity and generalization on counterfactual assertions
Abstract
We analyze the storage and recall of factual associations in autoregressive transformer language models, finding evidence that these associations correspond to localized, directly-editable computations. We first develop a causal intervention for identifying neuron activations that are decisive in a model's factual predictions. This reveals a distinct set of steps in middle-layer feed-forward modules that mediate factual predictions while processing subject tokens. To test our hypothesis that these computations correspond to factual association recall, we modify feed-forward weights to update specific factual associations using Rank-One Model Editing (ROME). We find that ROME is effective on a standard zero-shot relation extraction (zsRE) model-editing task, comparable to existing methods. To perform a more sensitive evaluation, we also evaluate ROME on a new dataset of counterfactual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Text Readability and Simplification
MethodsRank-One Model Editing
