Retention analysis of edited knowledge after fine-tuning
Fufang Wen, Shichang Zhang

TL;DR
This paper investigates how fine-tuning affects previously edited knowledge in large language models, revealing increased susceptibility to forgetting and proposing methods to improve knowledge retention.
Contribution
It systematically analyzes the interaction between fine-tuning and model editing techniques, highlighting limitations and proposing strategies for more robust knowledge editing.
Findings
Edited knowledge is more prone to forgetting during fine-tuning than intrinsic knowledge.
Augmenting edits with paraphrases improves retention.
Freezing layers related to edited content enhances knowledge preservation.
Abstract
Large language models (LLMs) store vast amounts of knowledge, which often requires updates to correct factual errors, incorporate newly acquired information, or adapt model behavior. Model editing methods have emerged as efficient solutions for such updates, offering localized and precise knowledge modification at significantly lower computational cost than continual training. In parallel, LLMs are frequently fine-tuned for a wide range of downstream tasks. However, the effect of fine-tuning on previously edited knowledge remains poorly understood. In this work, we systematically investigate how different fine-tuning objectives interact with various model editing techniques. Our findings show that edited knowledge is substantially more susceptible to forgetting during fine-tuning than intrinsic knowledge acquired through pre-training. This analysis highlights a key limitation of current…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Model-Driven Software Engineering Techniques · Multimodal Machine Learning Applications
