WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models
Peng Wang, Zexi Li, Ningyu Zhang, Ziwen Xu, Yunzhi Yao, Yong Jiang,, Pengjun Xie, Fei Huang, Huajun Chen

TL;DR
WISE introduces a dual memory scheme with a router and knowledge sharding to improve lifelong model editing of large language models, addressing reliability, generalization, and locality issues.
Contribution
The paper proposes WISE, a novel dual parametric memory system with a routing mechanism and knowledge sharding for effective lifelong model editing.
Findings
Outperforms previous editing methods across multiple tasks.
Effectively overcomes the impossible triangle in lifelong editing.
Works across various large language model architectures.
Abstract
Large language models (LLMs) need knowledge updates to meet the ever-growing world facts and correct the hallucinated responses, facilitating the methods of lifelong model editing. Where the updated knowledge resides in memories is a fundamental question for model editing. In this paper, we find that editing either long-term memory (direct model parameters) or working memory (non-parametric knowledge of neural network activations/representations by retrieval) will result in an impossible triangle -- reliability, generalization, and locality can not be realized together in the lifelong editing settings. For long-term memory, directly editing the parameters will cause conflicts with irrelevant pretrained knowledge or previous edits (poor reliability and locality). For working memory, retrieval-based activations can hardly make the model understand the edits and generalize (poor…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsModel-Driven Software Engineering Techniques · Topic Modeling · Natural Language Processing Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Attention Dropout · Linear Layer · Multi-Head Attention · Residual Connection · Weight Decay · Linear Warmup With Cosine Annealing · Byte Pair Encoding
