Existing Large Language Model Unlearning Evaluations Are Inconclusive
Zhili Feng, Yixuan Even Xu, Alexander Robey, Robert Kirk, Xander Davies, Yarin Gal, Avi Schwarzschild, J. Zico Kolter

TL;DR
This paper critically examines current evaluation methods for large language model unlearning, revealing significant limitations and proposing improved evaluation principles to better assess unlearning effectiveness.
Contribution
It identifies key flaws in existing evaluation practices and introduces two principles—minimal information injection and downstream task awareness—to improve unlearning assessment.
Findings
Current evaluations can mask true unlearning performance.
Evaluation results vary greatly across different tasks.
Many evaluations rely on spurious correlations, reducing trustworthiness.
Abstract
Machine unlearning aims to remove sensitive or undesired data from large language models. However, recent studies suggest that unlearning is often shallow, claiming that removed knowledge can easily be recovered. In this work, we critically examine standard unlearning evaluation practices and uncover key limitations that shake our trust in those findings. First, we show that some evaluations introduce substantial new information into the model, potentially masking true unlearning performance by re-teaching the model during testing. Second, we demonstrate that evaluation outcomes vary significantly across tasks, undermining the generalizability of current evaluation routines. Finally, we find that many evaluations rely on spurious correlations, making their results difficult to trust and interpret. Taken together, these issues suggest that current evaluation protocols may both overstate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
