Reversing the Forget-Retain Objectives: An Efficient LLM Unlearning   Framework from Logit Difference

Jiabao Ji; Yujian Liu; Yang Zhang; Gaowen Liu; Ramana Rao Kompella,; Sijia Liu; Shiyu Chang

arXiv:2406.08607·cs.CL·June 14, 2024

Reversing the Forget-Retain Objectives: An Efficient LLM Unlearning Framework from Logit Difference

Jiabao Ji, Yujian Liu, Yang Zhang, Gaowen Liu, Ramana Rao Kompella,, Sijia Liu, Shiyu Chang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces ULD, a novel LLM unlearning framework that efficiently forgets specific knowledge by reversing traditional objectives, significantly reducing training time and preserving model utility.

Contribution

ULD employs an assistant LLM to reverse unlearning goals, resolving key challenges and improving efficiency over existing methods.

Findings

01

Reduces training time by over threefold

02

Achieves 0% utility loss on ToFU benchmark

03

Effectively forgets targeted knowledge while preserving overall capabilities

Abstract

As Large Language Models (LLMs) demonstrate extensive capability in learning from documents, LLM unlearning becomes an increasingly important research area to address concerns of LLMs in terms of privacy, copyright, etc. A conventional LLM unlearning task typically involves two goals: (1) The target LLM should forget the knowledge in the specified forget documents, and (2) it should retain the other knowledge that the LLM possesses, for which we assume access to a small number of retain documents. To achieve both goals, a mainstream class of LLM unlearning methods introduces an optimization framework with a combination of two objectives - maximizing the prediction loss on the forget documents while minimizing that on the retain documents, which suffers from two challenges, degenerated output and catastrophic forgetting. In this paper, we propose a novel unlearning framework called…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ucsb-nlp-chang/uld
pytorchOfficial

Videos

Reversing the Forget-Retain Objectives: An Efficient LLM Unlearning Framework from Logit Difference· slideslive

Taxonomy

TopicsDigital Rights Management and Security · Library Science and Information Systems

MethodsTofu