Does Machine Unlearning Truly Remove Knowledge?

Haokun Chen; Yueqi Zhang; Yuan Bi; Yao Zhang; Tong Liu; Jinhe Bi; Jian Lan; Jindong Gu; Claudia Grosser; Denis Krompass; Nassir Navab; Volker Tresp

arXiv:2505.23270·cs.LG·October 14, 2025

Does Machine Unlearning Truly Remove Knowledge?

Haokun Chen, Yueqi Zhang, Yuan Bi, Yao Zhang, Tong Liu, Jinhe Bi, Jian Lan, Jindong Gu, Claudia Grosser, Denis Krompass, Nassir Navab, Volker Tresp

PDF

TL;DR

This paper introduces a comprehensive framework for evaluating the effectiveness of machine unlearning algorithms in large language models, addressing the challenge of verifying knowledge removal through novel auditing techniques.

Contribution

It presents a new auditing framework with benchmark datasets, multiple unlearning algorithms, and innovative methods using intermediate activation perturbations.

Findings

01

Auditing algorithms reveal varying effectiveness of unlearning strategies.

02

Prompt-based methods have limitations that can be addressed by activation perturbation techniques.

03

The framework enables systematic evaluation of unlearning robustness and efficacy.

Abstract

In recent years, Large Language Models (LLMs) have achieved remarkable advancements, drawing significant attention from the research community. Their capabilities are largely attributed to large-scale architectures, which require extensive training on massive datasets. However, such datasets often contain sensitive or copyrighted content sourced from the public internet, raising concerns about data privacy and ownership. Regulatory frameworks, such as the General Data Protection Regulation (GDPR), grant individuals the right to request the removal of such sensitive information. This has motivated the development of machine unlearning algorithms that aim to remove specific knowledge from models without the need for costly retraining. Despite these advancements, evaluating the efficacy of unlearning algorithms remains a challenge due to the inherent complexity and generative nature of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSoftmax · Attention Is All You Need