Machine Unlearning in Large Language Models

Saaketh Koundinya Gundavarapu; Shreya Agarwal; Arushi Arora; Chandana; Thimmalapura Jagadeeshaiah

arXiv:2405.15152·cs.CL·May 27, 2024·2 cites

Machine Unlearning in Large Language Models

Saaketh Koundinya Gundavarapu, Shreya Agarwal, Arushi Arora, Chandana, Thimmalapura Jagadeeshaiah

PDF

Open Access 1 Repo

TL;DR

This paper proposes a gradient ascent-based method for machine unlearning in large language models to selectively erase harmful responses and copyrighted content, improving ethical standards while preserving general knowledge.

Contribution

It introduces a novel unlearning approach using gradient ascent and LoRA finetuning to effectively remove specific harmful or copyrighted information from LLMs.

Findings

01

75% reduction in harmful responses on PKU dataset

02

Effective removal of copyrighted content from LLMs

03

New evaluation technique for unlearning effectiveness

Abstract

Machine unlearning, a novel area within artificial intelligence, focuses on addressing the challenge of selectively forgetting or reducing undesirable knowledge or behaviors in machine learning models, particularly in the context of large language models (LLMs). This paper introduces a methodology to align LLMs, such as Open Pre-trained Transformer Language Models, with ethical, privacy, and safety standards by leveraging the gradient ascent algorithm for knowledge unlearning. Our approach aims to selectively erase or modify learned information in LLMs, targeting harmful responses and copyrighted content. This paper presents a dual-pronged approach to enhance the ethical and safe behavior of large language models (LLMs) by addressing the issues of harmful responses and copyrighted content. To mitigate harmful responses, we applied gradient ascent on the PKU dataset, achieving a 75\%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shreya1313/llm-unlearning
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Multi-Head Attention · Dropout · Dense Connections