Offset Unlearning for Large Language Models

James Y. Huang; Wenxuan Zhou; Fei Wang; Fred Morstatter; Sheng Zhang; Hoifung Poon; Muhao Chen

arXiv:2404.11045·cs.CL·May 29, 2025·1 cites

Offset Unlearning for Large Language Models

James Y. Huang, Wenxuan Zhou, Fei Wang, Fred Morstatter, Sheng Zhang, Hoifung Poon, Muhao Chen

PDF

Open Access 1 Repo

TL;DR

This paper introduces { extdelta}-Unlearning, a novel method for unlearning sensitive data in black-box large language models by learning logit offsets through smaller models, addressing ethical concerns while preserving performance.

Contribution

The paper presents { extdelta}-Unlearning, a versatile offset unlearning framework applicable to black-box LLMs that does not require internal model access or data retention.

Findings

01

Effectively unlearns target data from black-box LLMs.

02

Maintains or improves performance on general tasks.

03

Compatible with various unlearning algorithms.

Abstract

Despite the strong capabilities of Large Language Models (LLMs) to acquire knowledge from their training corpora, the memorization of sensitive information in the corpora such as copyrighted, biased, and private content has led to ethical and legal concerns. In response to these challenges, unlearning has emerged as a potential remedy for LLMs affected by problematic training data. However, previous unlearning techniques are either not applicable to black-box LLMs due to required access to model internal weights, or violate data protection principles by retaining sensitive data for inference-time correction. We propose {\delta}-Unlearning, an offset unlearning framework for black-box LLMs. Instead of tuning the black-box LLM itself, {\delta}-Unlearning learns the logit offset needed for unlearning by contrasting the logits from a pair of smaller models. Experiments demonstrate that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ucsb-nlp-chang/uld
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling