Unlearned but Not Forgotten: Data Extraction after Exact Unlearning in LLM

Xiaoyu Wu; Yifei Pang; Terrance Liu; Zhiwei Steven Wu

arXiv:2505.24379·cs.LG·October 23, 2025

Unlearned but Not Forgotten: Data Extraction after Exact Unlearning in LLM

Xiaoyu Wu, Yifei Pang, Terrance Liu, Zhiwei Steven Wu

PDF

Open Access

TL;DR

This paper demonstrates that exact unlearning in large language models can still leak sensitive data through a novel extraction attack, especially when both pre- and post-unlearning models are accessible, raising privacy concerns.

Contribution

The authors introduce a new data extraction attack exploiting signals from pre-unlearning models, significantly improving privacy breach success rates in practical deployment scenarios.

Findings

01

Extraction success doubles in some benchmarks

02

Attack effective on medical diagnosis dataset

03

Unlearning may increase privacy risks

Abstract

Large Language Models are typically trained on datasets collected from the web, which may inadvertently contain harmful or sensitive personal information. To address growing privacy concerns, unlearning methods have been proposed to remove the influence of specific data from trained models. Of these, exact unlearning -- which retrains the model from scratch without the target data -- is widely regarded the gold standard for mitigating privacy risks in deployment. In this paper, we revisit this assumption in a practical deployment setting where both the pre- and post-unlearning logits API are exposed, such as in open-weight scenarios. Targeting this setting, we introduce a novel data extraction attack that leverages signals from the pre-unlearning model to guide the post-unlearning model, uncovering patterns that reflect the removed data distribution. Combining model guidance with a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling