Towards LLM Unlearning Resilient to Relearning Attacks: A Sharpness-Aware Minimization Perspective and Beyond

Chongyu Fan; Jinghan Jia; Yihua Zhang; Anil Ramakrishna; Mingyi Hong; Sijia Liu

arXiv:2502.05374·cs.LG·May 28, 2025

Towards LLM Unlearning Resilient to Relearning Attacks: A Sharpness-Aware Minimization Perspective and Beyond

Chongyu Fan, Jinghan Jia, Yihua Zhang, Anil Ramakrishna, Mingyi Hong, Sijia Liu

PDF

Open Access 1 Repo 10 Models 1 Video

TL;DR

This paper explores how to improve the robustness of large language model unlearning techniques against relearning attacks by leveraging sharpness-aware minimization and smoothing strategies, demonstrating significant resistance improvements.

Contribution

It establishes a novel connection between robust unlearning and sharpness-aware minimization, introducing smoothing strategies to enhance unlearning robustness against relearning and jailbreak attacks.

Findings

01

SAM and smoothing strategies improve resistance to relearning attacks

02

Smoothness optimization defends against input-level jailbreaks

03

Experimental results on WMDP and MUSE datasets validate effectiveness

Abstract

The LLM unlearning technique has recently been introduced to comply with data regulations and address the safety and ethical concerns of LLMs by removing the undesired data-model influence. However, state-of-the-art unlearning methods face a critical vulnerability: they are susceptible to ``relearning'' the removed information from a small number of forget data points, known as relearning attacks. In this paper, we systematically investigate how to make unlearned models robust against such attacks. For the first time, we establish a connection between robust unlearning and sharpness-aware minimization (SAM) through a unified robust optimization framework, in an analogy to adversarial training designed to defend against adversarial attacks. Our analysis for SAM reveals that smoothness optimization plays a pivotal role in mitigating relearning attacks. Thus, we further explore diverse…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

optml-group/unlearn-smooth
pytorchOfficial

Models

Videos

Towards LLM Unlearning Resilient to Relearning Attacks: A Sharpness-Aware Minimization Perspective and Beyond· slideslive

Taxonomy

TopicsNetwork Security and Intrusion Detection · Privacy-Preserving Technologies in Data · Cloud Data Security Solutions

MethodsSharpness-Aware Minimization · Segment Anything Model