Large Language Model Unlearning via Embedding-Corrupted Prompts

Chris Yuhao Liu; Yaxuan Wang; Jeffrey Flanigan; Yang Liu

arXiv:2406.07933·cs.CL·November 1, 2024

Large Language Model Unlearning via Embedding-Corrupted Prompts

Chris Yuhao Liu, Yaxuan Wang, Jeffrey Flanigan, Yang Liu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces ECO Prompts, a lightweight framework for efficiently unlearning specific knowledge in large language models by corrupting prompt embeddings, achieving near-zero side effects and scalability across models.

Contribution

The paper proposes a novel embedding-corruption method using prompt classifiers and zeroth order optimization for effective unlearning in large language models.

Findings

01

Effective unlearning with minimal side effects

02

Scalable to models from 0.5B to 236B parameters

03

No additional cost as model size increases

Abstract

Large language models (LLMs) have advanced to encompass extensive knowledge across diverse domains. Yet controlling what a large language model should not know is important for ensuring alignment and thus safe use. However, accurately and efficiently unlearning knowledge from an LLM remains challenging due to the potential collateral damage caused by the fuzzy boundary between retention and forgetting, and the large computational requirements for optimization across state-of-the-art models with hundreds of billions of parameters. In this work, we present \textbf{Embedding-COrrupted (ECO) Prompts}, a lightweight unlearning framework for large language models to address both the challenges of knowledge entanglement and unlearning efficiency. Instead of relying on the LLM itself to unlearn, we enforce an unlearned state during inference by employing a prompt classifier to identify and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chrisliu298/llm-unlearn-eco
pytorchOfficial

Videos

Large Language Model Unlearning via Embedding-Corrupted Prompts· slideslive

Taxonomy

TopicsTopic Modeling · Speech Recognition and Synthesis