Editing as Unlearning: Are Knowledge Editing Methods Strong Baselines for Large Language Model Unlearning?
Zexi Li, Xiangzhu Wang, William F. Shen, Meghdad Kurmanji, Xinchi Qiu, Dongqi Cai, Chao Wu, Nicholas D. Lane

TL;DR
This paper explores the connection between knowledge editing and unlearning in large language models, demonstrating that state-of-the-art editing methods can serve as effective baselines for unlearning tasks, especially with proposed practical improvements.
Contribution
It conceptualizes unlearning as a special case of editing, evaluates editing methods as unlearning baselines, and introduces practical recipes to enhance their effectiveness.
Findings
WISE and AlphaEdit are effective unlearning baselines.
Editing methods excel at generating human-aligned refusal answers.
Proposed recipes improve unlearning performance on longer sequences.
Abstract
Large language Model (LLM) unlearning, i.e., selectively removing information from LLMs, is vital for responsible model deployment. Differently, LLM knowledge editing aims to modify LLM knowledge instead of removing it. Though editing and unlearning seem to be two distinct tasks, we find there is a tight connection between them. In this paper, we conceptualize unlearning as a special case of editing where information is modified to a refusal or "empty set" response, signifying its removal. This paper thus investigates if knowledge editing techniques are strong baselines for LLM unlearning. We evaluate state-of-the-art (SOTA) editing methods (e.g., ROME, MEMIT, GRACE, WISE, and AlphaEdit) against existing unlearning approaches on pretrained and finetuned knowledge. Results show certain editing methods, notably WISE and AlphaEdit, are effective unlearning baselines, especially…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsADaptive gradient method with the OPTimal convergence rate · Rank-One Model Editing
