UNLEARN Efficient Removal of Knowledge in Large Language Models

Tyler Lizzo; Larry Heck

arXiv:2408.04140·cs.CL·August 9, 2024

UNLEARN Efficient Removal of Knowledge in Large Language Models

Tyler Lizzo, Larry Heck

PDF

Open Access 1 Video

TL;DR

This paper introduces UNLEARN, a method for efficiently removing specific knowledge from large language models without retraining, achieving high forgetting accuracy while preserving overall performance.

Contribution

The paper presents a novel subspace-based approach for targeted knowledge removal and introduces LEARN for knowledge addition, advancing model editing capabilities.

Findings

01

96% of targeted knowledge can be forgotten

02

Maintains performance within 2.5% of original model

03

Outperforms previous state-of-the-art in knowledge removal

Abstract

Given the prevalence of large language models (LLMs) and the prohibitive cost of training these models from scratch, dynamically forgetting specific knowledge e.g., private or proprietary, without retraining the model has become an important capability. This paper proposes a novel method to achieve this objective called UNLEARN. The approach builds upon subspace methods to identify and specifically target the removal of knowledge without adversely affecting other knowledge in the LLM. Results demonstrate 96% of targeted knowledge can be forgotten while maintaining performance on other knowledge within 2.5% of the original model, significantly outperforming the discriminatory abilities of the previous state-of-the-art. A dual method called LEARN is also proposed for targeted knowledge addition. Results show LEARN can match the fine-tuning accuracy of Low-Rank Adaptation (LoRA) without…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

UNLEARN Efficient Removal of Knowledge in Large Language Models· underline

Taxonomy

TopicsTopic Modeling