CodeUnlearn: Amortized Zero-Shot Machine Unlearning in Language Models   Using Discrete Concept

YuXuan Wu; Bonaventure F. P. Dossou; Dianbo Liu

arXiv:2410.10866·cs.CL·October 16, 2024

CodeUnlearn: Amortized Zero-Shot Machine Unlearning in Language Models Using Discrete Concept

YuXuan Wu, Bonaventure F. P. Dossou, Dianbo Liu

PDF

Open Access

TL;DR

This paper introduces CodeUnlearn, a novel method for efficiently removing specific information from large language models using codebook features and autoencoders, enabling targeted unlearning without significant performance loss.

Contribution

It presents the first approach to unlearn specific topics with contextual relevance in LLMs using an amortized, autoencoder-based method that improves efficiency and effectiveness.

Findings

01

Successfully unlearns targeted information in LLMs

02

Maintains model performance on unrelated data

03

First to enable topic-specific unlearning with context in LLMs

Abstract

Large Language Models (LLMs) offer extensive knowledge across various domains, but they may inadvertently memorize sensitive, unauthorized, or malicious data, such as personal information in the medical and financial sectors. Machine unlearning methods aim to remove specific information from models after training to address this. However, current approaches require additional model training or struggle to effectively erase particular data points and their associated context due to LLMs' complex, dense, and continuous nature. In this study, we propose a novel amortized unlearning approach using codebook features and Sparse Autoencoders (SAEs). By leveraging a bottleneck to decompose the activation space and regulate information flow, our method efficiently unlearns targeted information while preserving the model's performance on unrelated data. To the best of our knowledge, this is the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling