Does Localization Inform Editing? Surprising Differences in   Causality-Based Localization vs. Knowledge Editing in Language Models

Peter Hase; Mohit Bansal; Been Kim; Asma Ghandeharioun

arXiv:2301.04213·cs.LG·October 17, 2023·22 cites

Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models

Peter Hase, Mohit Bansal, Been Kim, Asma Ghandeharioun

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper investigates the relationship between localization methods like Causal Tracing and the effectiveness of editing facts in language models, revealing surprising disconnects and challenging assumptions about how model knowledge is stored and manipulated.

Contribution

It demonstrates that localization results from Causal Tracing do not reliably indicate which model layers to edit, questioning the utility of current localization techniques for model editing.

Findings

01

Localization from Causal Tracing does not predict which layers to edit.

02

Layer choice is a better predictor of editing success than localization results.

03

Better mechanistic understanding does not always improve editing strategies.

Abstract

Language models learn a great quantity of factual information during pretraining, and recent work localizes this information to specific model weights like mid-layer MLP weights. In this paper, we find that we can change how a fact is stored in a model by editing weights that are in a different location than where existing methods suggest that the fact is stored. This is surprising because we would expect that localizing facts to specific model parameters would tell us where to manipulate knowledge in models, and this assumption has motivated past work on model editing methods. Specifically, we show that localization conclusions from representation denoising (also known as Causal Tracing) do not provide any insight into which model MLP layer would be best to edit in order to override an existing stored fact with a new one. This finding raises questions about how past work relies on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

google/belief-localization
pytorchOfficial

Videos

Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)