Revealing and Mitigating Over-Attention in Knowledge Editing

Pinzheng Wang; Zecheng Tang; Keyan Zhou; Juntao Li; Qiaoming Zhu; Min; Zhang

arXiv:2502.14838·cs.CL·February 21, 2025

Revealing and Mitigating Over-Attention in Knowledge Editing

Pinzheng Wang, Zecheng Tang, Keyan Zhou, Juntao Li, Qiaoming Zhu, Min, Zhang

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper identifies that over-attention in language models causes knowledge editing errors and proposes SADR, a regularization method to restrict attention shifts, effectively reducing such errors across multiple models.

Contribution

The paper introduces SADR, a novel regularization technique that mitigates over-attention during knowledge editing, improving model reliability and preserving pre-existing knowledge.

Findings

01

SADR significantly reduces Specificity Failure in knowledge editing.

02

The method is effective across five large language models.

03

SADR maintains model performance while preventing attention over-focus.

Abstract

Large Language Models have demonstrated superior performance across a wide range of tasks, but they still exhibit undesirable errors due to incorrect knowledge learned from the training data. To avoid this, knowledge editing methods emerged to precisely edit the specific model knowledge via efficiently modifying a very small percentage of parameters. % However, those methods can lead to the problem of Specificity Failure: when the content related to the edited knowledge occurs in the context, it can inadvertently corrupt other pre-existing knowledge. However, those methods can lead to the problem of Specificity Failure, where the existing knowledge and capabilities are severely degraded due to editing. Our preliminary indicates that Specificity Failure primarily stems from the model's attention heads assigning excessive attention scores to entities related to the edited knowledge,…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 8Confidence 4

Strengths

The paper impresses with its consistently comprehensible and stringent argumentation. The authors start with a problem of a current methodology, prove that this problem exists, identify the underlying significant cause and can thus propose a solution method for the problem. The paper is comprehensibly written and error-free throughout, the illustrations and tables are helpful and well chosen. An additional plus is the ablation study, which deals with the trade-off between editing success and spe

Weaknesses

A look at the appendix shows that the experiments for this article were much more extensive than stated in the actual paper. in addition to further details and results of the experiments described, further results for additional editing methods (WISE, MEND) and additional data sets can be found here. A human evaluation is also attached. It is a pity that even the section on limitations and future work did not find space in the main text. A minor weakness of the paper could be that it is not made

Reviewer 02Rating 6Confidence 4

Strengths

1. The specificity is an important problem of the knowledge editing and the proposed method can effectively alleviate this problem. 2. The authors consider the specificity problem comprehensively and conduct a thorough evaluation of SADR against existing methods and models, providing a comprehensive analysis of its performance.

Weaknesses

1. From the experiment results, the proposed method leads to a performance drop in the generalization, which is actually an important metric in knowledge editing. In my view, this drop may be caused by the attention-learning method as it would make the model focus less on the subject in other contexts. This drawback would deteriorate the contribution of the method. 2. Although the proposed method demonstrates good performance under the specificity metric, I'm not that convinced by the analysis a

Reviewer 03Rating 8Confidence 4

Strengths

* The paper is well-motivated: it explores the reasons behind the Specificity Failure observed in edited models, and proposes an effective solution to address this issue. * SADR is generalizable: by incorporating an additional loss function, the SADR can be applied to various knowledge editing techniques. * The article is well-structured: it first identifies specificity failure through guided experiments and then delves into the causes of specificity failure. Finally the paper proposes solution.

Weaknesses

**Main Weaknesses** * *W1*: I suggest conducting additional experiments on Mquake [1] to prove the effectiveness of the method. Recent research [1] has shown that existing knowledge editing methods are not good at multi-hop editing. For example, when we edit a piece of knowledge from *<CountryX, Prime_Minister, PersonY>* to *<CountryX, Prime_Minister, PersonZ>*, the corresponding knowledge *<CountryX, First_Lady, PersonY's wife>* should also be changed to *<CountryX, First_Lady, PersonZ's wife

Code & Models

Repositories

PinzhengWang322/Reveal_Attention_Drift
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsSoftmax · Attention Is All You Need · Focus