Do LLMs Really Forget? Evaluating Unlearning with Knowledge Correlation and Confidence Awareness

Rongzhe Wei; Peizhi Niu; Hans Hao-Hsun Hsu; Ruihan Wu; Haoteng Yin; Mohsen Ghassemi; Yifan Li; Vamsi K. Potluru; Eli Chien; Kamalika Chaudhuri; Olgica Milenkovic; Pan Li

arXiv:2506.05735·cs.CL·October 23, 2025

Do LLMs Really Forget? Evaluating Unlearning with Knowledge Correlation and Confidence Awareness

Rongzhe Wei, Peizhi Niu, Hans Hao-Hsun Hsu, Ruihan Wu, Haoteng Yin, Mohsen Ghassemi, Yifan Li, Vamsi K. Potluru, Eli Chien, Kamalika Chaudhuri, Olgica Milenkovic, Pan Li

PDF

Open Access

TL;DR

This paper introduces a new evaluation framework for machine unlearning in LLMs that considers implicit knowledge dependencies and uses LLM-based reasoning to assess unlearning success more accurately.

Contribution

It proposes a knowledge graph-based evaluation method with LLM judges, addressing limitations of existing explicit fact removal approaches.

Findings

01

Current methods overestimate unlearning effectiveness.

02

The framework provides a more realistic assessment of unlearning.

03

Experiments show improved evaluation accuracy.

Abstract

Machine unlearning techniques aim to mitigate unintended memorization in large language models (LLMs). However, existing approaches predominantly focus on the explicit removal of isolated facts, often overlooking latent inferential dependencies and the non-deterministic nature of knowledge within LLMs. Consequently, facts presumed forgotten may persist implicitly through correlated information. To address these challenges, we propose a knowledge unlearning evaluation framework that more accurately captures the implicit structure of real-world knowledge by representing relevant factual contexts as knowledge graphs with associated confidence scores. We further develop an inference-based evaluation protocol leveraging powerful LLMs as judges; these judges reason over the extracted knowledge subgraph to determine unlearning success. Our LLM judges utilize carefully designed prompts and are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques