LLM Ghostbusters: Surgical Hallucination Suppression via Adaptive Unlearning
Joseph Spracklen, Pedram Aghazadeh, Farinaz Koushanfar, Murtuza Jadliwala

TL;DR
This paper introduces Adaptive Unlearning, a post-deployment method for large language models that suppresses hallucinations like fictional package recommendations, reducing security vulnerabilities without harming overall performance.
Contribution
It presents a novel hybrid token-level objective and adaptive discovery loop for surgically unlearning hallucinations in deployed LLMs, enhancing security and reliability.
Findings
Reduces package hallucination rates by 81%
Maintains performance on standard coding benchmarks
Effectively isolates hallucination suppression to targeted distributions
Abstract
Hallucinations, outputs that sound plausible but are factually incorrect, remain an open challenge for deployed LLMs. In code generation, models frequently hallucinate non-existent software packages, recommending imports and installation commands for fictional libraries. This creates a critical supply-chain vulnerability: an attacker can proactively register such packages on public registries with malicious payloads that are subsequently installed and executed by developers or autonomous agents, a class of package confusion attack known as slopsquatting. Once a model is deployed, mitigating this failure mode is difficult: full retraining is costly, and existing approaches either cause severe degradation of model utility or rely on a pre-specified forget-set, an assumption that does not apply to the unbounded space of hallucinations. To address this problem, we present Adaptive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
