FIT to Forget: Robust Continual Unlearning for Large Language Models
Xiaoyu Xu, Minxin Du, Kun Fang, Yaxin Xiao, Zhicong Huang, Cheng Hong, Qingqing Ye, Haibo Hu

TL;DR
This paper introduces it, a robust continual unlearning framework for large language models that effectively handles sequential deletion requests while maintaining utility and resisting catastrophic forgetting.
Contribution
The paper proposes a novel it framework with three mechanisms and a comprehensive benchmark for evaluating continual unlearning in LLMs, advancing the state-of-the-art.
Findings
it outperforms existing methods in unlearning efficacy and utility preservation.
It maintains strong downstream performance after hundreds of sequential requests.
it shows resilience against relearning and recovery attacks.
Abstract
While large language models (LLMs) exhibit remarkable capabilities, they increasingly face demands to unlearn memorized privacy-sensitive, copyrighted, or harmful content. Existing unlearning methods primarily focus on \emph{single-shot} scenarios, whereas real-world deletion requests arrive \emph{continually}. Na\"ively applying these methods to sequential requests leads to severe utility degradation and catastrophic forgetting. To address this, we propose \fit, a robust continual unlearning framework to process high-volume sequential deletion streams while resisting both catastrophic forgetting and post-unlearning recovery. \fit stabilizes sequential updates through three synergistic mechanisms: redundancy \underline{F}iltering, \underline{I}mportance-aware adaptive algorithm selection, and \underline{T}argeted layer attribution. Furthermore, to facilitate rigorous evaluation, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
