Towards LLM Unlearning Resilient to Relearning Attacks: A Sharpness-Aware Minimization Perspective and Beyond
Chongyu Fan, Jinghan Jia, Yihua Zhang, Anil Ramakrishna, Mingyi Hong, Sijia Liu

TL;DR
This paper explores how to improve the robustness of large language model unlearning techniques against relearning attacks by leveraging sharpness-aware minimization and smoothing strategies, demonstrating significant resistance improvements.
Contribution
It establishes a novel connection between robust unlearning and sharpness-aware minimization, introducing smoothing strategies to enhance unlearning robustness against relearning and jailbreak attacks.
Findings
SAM and smoothing strategies improve resistance to relearning attacks
Smoothness optimization defends against input-level jailbreaks
Experimental results on WMDP and MUSE datasets validate effectiveness
Abstract
The LLM unlearning technique has recently been introduced to comply with data regulations and address the safety and ethical concerns of LLMs by removing the undesired data-model influence. However, state-of-the-art unlearning methods face a critical vulnerability: they are susceptible to ``relearning'' the removed information from a small number of forget data points, known as relearning attacks. In this paper, we systematically investigate how to make unlearned models robust against such attacks. For the first time, we establish a connection between robust unlearning and sharpness-aware minimization (SAM) through a unified robust optimization framework, in an analogy to adversarial training designed to defend against adversarial attacks. Our analysis for SAM reveals that smoothness optimization plays a pivotal role in mitigating relearning attacks. Thus, we further explore diverse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗OPTML-Group/NPO-WMDPmodel· 41 dl41 dl
- 🤗OPTML-Group/NPO-RS-WMDPmodel· 1 dl1 dl
- 🤗OPTML-Group/NPO-GP-WMDPmodel· 4 dl4 dl
- 🤗OPTML-Group/NPO-WA-WMDPmodel· 1 dl1 dl
- 🤗OPTML-Group/NPO-CR-WMDPmodel· 4 dl4 dl
- 🤗OPTML-Group/GradDiff-WMDPmodel· 12 dl12 dl
- 🤗OPTML-Group/GradDiff-SAM-WMDPmodel· 8 dl8 dl
- 🤗OPTML-Group/NPO-SAM-WMDPmodel· 4 dl4 dl
- 🤗OPTML-Group/NPO-SAM-MUSE-NEWSmodel· 67 dl67 dl
- 🤗OPTML-Group/NPO-SAM-MUSE-BOOKSmodel· 165 dl165 dl
Videos
Taxonomy
TopicsNetwork Security and Intrusion Detection · Privacy-Preserving Technologies in Data · Cloud Data Security Solutions
MethodsSharpness-Aware Minimization · Segment Anything Model
