Golden Layers and Where to Find Them: Improved Knowledge Editing for Large Language Models Via Layer Gradient Analysis
Shrestha Datta, Hongfu Liu, Anshuman Chhabra

TL;DR
This paper introduces the concept of 'golden layers' in large language models, proposing a new method called Layer Gradient Analysis (LGA) to identify them for improved knowledge editing.
Contribution
The work hypothesizes and empirically validates the existence of fixed golden layers that optimize knowledge editing, and introduces LGA for efficient identification of these layers.
Findings
Golden layers can match sample-wise optimal layers in editing performance.
Golden layers identified via proxy datasets generalize well to unseen queries.
LGA method is effective and robust across different models and editing techniques.
Abstract
Knowledge editing in Large Language Models (LLMs) aims to update the model's prediction for a specific query to a desired target while preserving its behavior on all other inputs. This process typically involves two stages: identifying the layer to edit and performing the parameter update. Intuitively, different queries may localize knowledge at different depths of the model, resulting in different sample-wise editing performance for a fixed editing layer. In this work, we hypothesize the existence of fixed golden layers that can achieve near-optimal editing performance similar to sample-wise optimal layers. To validate this hypothesis, we provide empirical evidence by comparing golden layers against ground-truth sample-wise optimal layers. Furthermore, we show that golden layers can be reliably identified using a proxy dataset and generalize effectively to unseen test set queries…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
