UNLEARN Efficient Removal of Knowledge in Large Language Models
Tyler Lizzo, Larry Heck

TL;DR
This paper introduces UNLEARN, a method for efficiently removing specific knowledge from large language models without retraining, achieving high forgetting accuracy while preserving overall performance.
Contribution
The paper presents a novel subspace-based approach for targeted knowledge removal and introduces LEARN for knowledge addition, advancing model editing capabilities.
Findings
96% of targeted knowledge can be forgotten
Maintains performance within 2.5% of original model
Outperforms previous state-of-the-art in knowledge removal
Abstract
Given the prevalence of large language models (LLMs) and the prohibitive cost of training these models from scratch, dynamically forgetting specific knowledge e.g., private or proprietary, without retraining the model has become an important capability. This paper proposes a novel method to achieve this objective called UNLEARN. The approach builds upon subspace methods to identify and specifically target the removal of knowledge without adversely affecting other knowledge in the LLM. Results demonstrate 96% of targeted knowledge can be forgotten while maintaining performance on other knowledge within 2.5% of the original model, significantly outperforming the discriminatory abilities of the previous state-of-the-art. A dual method called LEARN is also proposed for targeted knowledge addition. Results show LEARN can match the fine-tuning accuracy of Low-Rank Adaptation (LoRA) without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling
