The Limits of Obliviate: Evaluating Unlearning in LLMs via Stimulus-Knowledge Entanglement-Behavior Framework
Aakriti Shah, Thai Le

TL;DR
This paper introduces the SKeB framework to evaluate unlearning in large language models by analyzing how persuasive prompts influence factual recall, revealing size-dependent effectiveness and providing a new tool for assessing unlearning robustness.
Contribution
The paper presents the Stimulus-Knowledge Entanglement-Behavior Framework (SKeB), a novel approach to measure and analyze knowledge entanglement and unlearning effectiveness in LLMs using entanglement metrics.
Findings
Persuasive prompts improve factual recall in unlearned models.
Effectiveness of unlearning decreases with larger model size.
SKeB enables assessment of unlearning completeness and robustness.
Abstract
Unlearning in large language models (LLMs) is crucial for managing sensitive data and correcting misinformation, yet evaluating its effectiveness remains an open problem. We investigate whether persuasive prompting can recall factual knowledge from deliberately unlearned LLMs across models ranging from 2.7B to 13B parameters (OPT-2.7B, LLaMA-2-7B, LLaMA-3.1-8B, LLaMA-2-13B). Drawing from ACT-R and Hebbian theory (spreading activation theories), as well as communication principles, we introduce Stimulus-Knowledge Entanglement-Behavior Framework (SKeB), which models information entanglement via domain graphs and tests whether factual recall in unlearned models is correlated with persuasive framing. We develop entanglement metrics to quantify knowledge activation patterns and evaluate factuality, non-factuality, and hallucination in outputs. Our results show persuasive prompts…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
