Existing Large Language Model Unlearning Evaluations Are Inconclusive

Zhili Feng; Yixuan Even Xu; Alexander Robey; Robert Kirk; Xander Davies; Yarin Gal; Avi Schwarzschild; J. Zico Kolter

arXiv:2506.00688·cs.LG·June 3, 2025

Existing Large Language Model Unlearning Evaluations Are Inconclusive

Zhili Feng, Yixuan Even Xu, Alexander Robey, Robert Kirk, Xander Davies, Yarin Gal, Avi Schwarzschild, J. Zico Kolter

PDF

Open Access

TL;DR

This paper critically examines current evaluation methods for large language model unlearning, revealing significant limitations and proposing improved evaluation principles to better assess unlearning effectiveness.

Contribution

It identifies key flaws in existing evaluation practices and introduces two principles—minimal information injection and downstream task awareness—to improve unlearning assessment.

Findings

01

Current evaluations can mask true unlearning performance.

02

Evaluation results vary greatly across different tasks.

03

Many evaluations rely on spurious correlations, reducing trustworthiness.

Abstract

Machine unlearning aims to remove sensitive or undesired data from large language models. However, recent studies suggest that unlearning is often shallow, claiming that removed knowledge can easily be recovered. In this work, we critically examine standard unlearning evaluation practices and uncover key limitations that shake our trust in those findings. First, we show that some evaluations introduce substantial new information into the model, potentially masking true unlearning performance by re-teaching the model during testing. Second, we demonstrate that evaluation outcomes vary significantly across tasks, undermining the generalizability of current evaluation routines. Finally, we find that many evaluations rely on spurious correlations, making their results difficult to trust and interpret. Taken together, these issues suggest that current evaluation protocols may both overstate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling