The Anatomy of an Edit: Mechanism-Guided Activation Steering for Knowledge Editing

Yuan Cao; Mingyang Wang; Hinrich Sch\"utze

arXiv:2603.20795·cs.CL·March 24, 2026

The Anatomy of an Edit: Mechanism-Guided Activation Steering for Knowledge Editing

Yuan Cao, Mingyang Wang, Hinrich Sch\"utze

PDF

Open Access

TL;DR

This paper investigates the internal mechanisms of knowledge editing in large language models using neuron-level attribution, and introduces MEGA, a new method that improves editing accuracy by targeting specific attention regions without altering model weights.

Contribution

It provides a mechanistic understanding of how edits are implemented inside LLMs and introduces MEGA, a novel, weight-free activation steering method based on post-edit attribution.

Findings

01

Mid-to-late attention promotes new facts during edits.

02

Attention and FFN modules work together to suppress old facts.

03

MEGA outperforms existing KE methods on multiple benchmarks.

Abstract

Large language models (LLMs) are increasingly used as knowledge bases, but keeping them up to date requires targeted knowledge editing (KE). However, it remains unclear how edits are implemented inside the model once applied. In this work, we take a mechanistic view of KE using neuron-level knowledge attribution (NLKA). Unlike prior work that focuses on pre-edit causal tracing and localization, we use post-edit attribution -- contrasting successful and failed edits -- to isolate the computations that shift when an edit succeeds. Across representative KE methods, we find a consistent pattern: mid-to-late attention predominantly promotes the new target, while attention and FFN modules cooperate to suppress the original fact. Motivated by these findings, we propose MEGA, a MEchanism-Guided Activation steering method that performs attention-residual interventions in attribution-aligned…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Advanced Graph Neural Networks · Topic Modeling