Norm Anchors Make Model Edits Last
Mingda Liu, Zhenghan Zhu, Ze'an Miao, Katsuki Fujisawa

TL;DR
This paper identifies a positive feedback loop causing model editing failures and introduces Norm-Anchor Scaling (NAS), a simple stabilizer that significantly extends editing capabilities and improves long-term performance.
Contribution
The paper formalizes the feedback loop causing editing failures and proposes NAS, a lightweight method that stabilizes model edits and enhances long-term editing effectiveness.
Findings
NAS extends the editing horizon by over 4 times.
NAS improves long-run editing performance by 72.2% on average.
NAS preserves single-edit efficacy with negligible overhead.
Abstract
Sequential Locate-and-Edit (L&E) model editing can fail abruptly after many edits. We identify and formalize this failure as a positive norm-feedback loop, in which solved value vectors and edited MLP weights progressively amplify each other, degrading edit quality and eventually collapsing model capabilities. Our analysis shows that this feedback can yield approximately exponential norm growth under standard L&E dynamics, and can remain unresolved by existing increment-level regularizers or update clamps. We propose Norm-Anchor Scaling (NAS), a plug-in stabilizer that breaks this loop by rescaling each solved value vector to an original-model reference norm. Across multiple LLM backbones, datasets, and L&E editors, NAS extends the usable editing horizon by more than 4x and improves long-run editing performance by 72.2% on average, while preserving single-edit efficacy, with only a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
