Should We Really Edit Language Models? On the Evaluation of Edited Language Models
Qi Li, Xiang Liu, Zhenheng Tang, Peijie Dong, Zeyu Li, Xinglin Pan,, Xiaowen Chu

TL;DR
This paper evaluates various methods for editing language models, revealing that current techniques often impair overall performance and safety, especially with larger models or numerous edits, highlighting the need for more reliable editing approaches.
Contribution
It provides a comprehensive evaluation of existing editing methods across different models, exposing their limitations and the impact on model performance and safety.
Findings
Editing methods cause performance decline on general benchmarks.
Larger models are more resistant to editing.
Model safety is significantly weakened after editing.
Abstract
Model editing has become an increasingly popular alternative for efficiently updating knowledge within language models. Current methods mainly focus on reliability, generalization, and locality, with many methods excelling across these criteria. Some recent works disclose the pitfalls of these editing methods such as knowledge distortion or conflict. However, the general abilities of post-edited language models remain unexplored. In this paper, we perform a comprehensive evaluation on various editing methods and different language models, and have following findings. (1) Existing editing methods lead to inevitable performance deterioration on general benchmarks, indicating that existing editing methods maintain the general abilities of the model within only a few dozen edits. When the number of edits is slightly large, the intrinsic knowledge structure of the model is disrupted or even…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques
MethodsFocus
