The Unlearning Mirage: A Dynamic Framework for Evaluating LLM Unlearning
Raj Sanjay Shah, Jing Huang, Keerthiram Murugesan, Nathalie Baracaldo, Diyi Yang

TL;DR
This paper introduces a dynamic, structured query-based framework to evaluate the robustness of unlearning methods in Large Language Models, revealing vulnerabilities especially in multi-hop reasoning scenarios.
Contribution
It presents a novel stress-testing framework that automatically generates complex probes to evaluate unlearning effectiveness beyond static benchmarks.
Findings
Unlearning methods often fail in multi-hop reasoning scenarios.
The framework uncovers new unlearning failures missed by existing benchmarks.
Multi-hop queries use alternative pathways less affected by unlearning.
Abstract
Unlearning in Large Language Models (LLMs) aims to enhance safety, mitigate biases, and comply with legal mandates, such as the right to be forgotten. However, existing unlearning methods are brittle: minor query modifications, such as multi-hop reasoning and entity aliasing, can recover supposedly forgotten information. As a result, current evaluation metrics often create an illusion of effectiveness, failing to detect these vulnerabilities due to reliance on static, unstructured benchmarks. We propose a dynamic framework that stress tests unlearning robustness using complex structured queries. Our approach first elicits knowledge from the target model (pre-unlearning) and constructs targeted probes, ranging from simple queries to multi-hop chains, allowing precise control over query difficulty. Our experiments show that the framework (1) shows comparable coverage to existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Ethics and Social Impacts of AI · Artificial Intelligence in Healthcare and Education
