LLM Unlearning Should Be Form-Independent
Xiaotian Ye, Mengqi Zhang, Shu Wu

TL;DR
This paper identifies the form-dependent bias in LLM unlearning methods, introduces a benchmark to evaluate robustness, and proposes a training-free technique called ROCR that enhances unlearning effectiveness by targeting invariant concepts.
Contribution
It formally characterizes form-dependent bias in LLM unlearning, introduces ORT benchmark for evaluation, and proposes ROCR, a novel training-free method to improve unlearning robustness.
Findings
Form-dependent bias is widespread and severe in current unlearning methods.
ROCR significantly outperforms traditional unlearning techniques.
ROCR can modify model perceptions within seconds, producing natural outputs.
Abstract
Large Language Model (LLM) unlearning aims to erase or suppress undesirable knowledge within the model, offering promise for controlling harmful or private information to prevent misuse. However, recent studies highlight its limited efficacy in real-world scenarios, hindering practical adoption. In this study, we identify a pervasive issue underlying many downstream failures: the effectiveness of existing unlearning methods heavily depends on the form of training samples and frequently fails to generalize to alternate expressions of the same knowledge. We formally characterize this problem as Form-Dependent Bias and systematically investigate its specific manifestation patterns across various downstream tasks. To quantify its prevalence and support future research, we introduce ORT, a novel benchmark designed to evaluate the robustness of unlearning methods against variations in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Misinformation and Its Impacts
