A General Framework to Enhance Fine-tuning-based LLM Unlearning

Jie Ren; Zhenwei Dai; Xianfeng Tang; Hui Liu; Jingying Zeng; Zhen Li,; Rahul Goutam; Suhang Wang; Yue Xing; Qi He; Hui Liu

arXiv:2502.17823·cs.LG·March 25, 2025

A General Framework to Enhance Fine-tuning-based LLM Unlearning

Jie Ren, Zhenwei Dai, Xianfeng Tang, Hui Liu, Jingying Zeng, Zhen Li,, Rahul Goutam, Suhang Wang, Yue Xing, Qi He, Hui Liu

PDF

Open Access 1 Repo

TL;DR

This paper introduces GRUN, a novel framework that enhances the utility of fine-tuning-based unlearning in LLMs by unifying and improving existing methods, leading to better data removal and model performance.

Contribution

The paper proposes Gated Representation UNlearning (GRUN), a general, efficient framework that improves unlearning effectiveness and utility in fine-tuning-based LLM unlearning methods.

Findings

01

GRUN significantly improves unlearning accuracy.

02

GRUN maintains higher model utility after unlearning.

03

The framework is effective across different unlearning scenarios.

Abstract

Unlearning has been proposed to remove copyrighted and privacy-sensitive data from Large Language Models (LLMs). Existing approaches primarily rely on fine-tuning-based methods, which can be categorized into gradient ascent-based (GA-based) and suppression-based methods. However, they often degrade model utility (the ability to respond to normal prompts). In this work, we aim to develop a general framework that enhances the utility of fine-tuning-based unlearning methods. To achieve this goal, we first investigate the common property between GA-based and suppression-based methods. We unveil that GA-based methods unlearn by distinguishing the target data (i.e., the data to be removed) and suppressing related generations, which is essentially the same strategy employed by suppression-based methods. Inspired by this finding, we introduce Gated Representation UNlearning (GRUN) which has two…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

renjie3/GRUN
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMagnetic confinement fusion research · Non-Destructive Testing Techniques · Particle accelerators and beam dynamics