Machine Unlearning in Large Language Models

Kongyang Chen; Zixin Wang; Bing Mi; Waixi Liu; Shaowei Wang; Xiaojun; Ren; Jiaxing Shen

arXiv:2404.16841·cs.CR·April 29, 2024·2 cites

Machine Unlearning in Large Language Models

Kongyang Chen, Zixin Wang, Bing Mi, Waixi Liu, Shaowei Wang, Xiaojun, Ren, Jiaxing Shen

PDF

Open Access

TL;DR

This paper proposes a novel machine unlearning framework for large language models to enhance privacy and security by preventing harmful or sensitive outputs while maintaining their overall performance.

Contribution

It introduces a new unlearning method using evaluative models and specialized loss functions to selectively erase undesirable knowledge in LLMs without degrading their capabilities.

Findings

01

Effective unlearning of harmful outputs demonstrated

02

Model performance remains largely intact after unlearning

03

Approach enhances privacy and security in LLMs

Abstract

Recently, large language models (LLMs) have emerged as a notable field, attracting significant attention for its ability to automatically generate intelligent contents for various application domains. However, LLMs still suffer from significant security and privacy issues. For example, LLMs might expose user privacy from hacking attacks or targeted prompts. To address this problem, this paper introduces a novel machine unlearning framework into LLMs. Our objectives are to make LLMs not produce harmful, hallucinatory, or privacy-compromising responses, while retaining their standard output capabilities. To accomplish this, we use an evaluative model to pinpoint dialogues needing unlearning. We also establish a distance loss to function as the model's negative loss, diverting it from previous undesirable outputs. Furthermore, we determine the expected output's cluster mean to formulate a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques