On Large Language Model Continual Unlearning
Chongyang Gao, Lixu Wang, Kaize Ding, Chenkai Weng, Xiao Wang, Qi Zhu

TL;DR
This paper introduces the OOO framework for continual unlearning in large language models, effectively removing undesired data influence without retaining previous data, and balancing unlearning effectiveness with utility preservation.
Contribution
The paper proposes a novel OOO framework combining orthogonal LoRA and an OOD detector for continual unlearning without data retention, improving effectiveness and utility.
Findings
OOO outperforms state-of-the-art unlearning methods across multiple datasets.
It effectively balances unlearning and utility preservation in continuous scenarios.
The framework does not require access to previous data during unlearning.
Abstract
While large language models have demonstrated impressive performance across various domains and tasks, their security issues have become increasingly severe. Machine unlearning has emerged as a representative approach for model safety and security by removing the influence of undesired data on the target model. However, these methods do not sufficiently consider that unlearning requests in real-world scenarios are continuously emerging, especially in the context of LLMs, which may lead to accumulated model utility loss that eventually becomes unacceptable. Moreover, existing LLM unlearning methods often ignore previous data access limitations due to privacy concerns and copyright protection. Without previous data, the utility preservation during unlearning is much harder. To overcome these challenges, we propose the OOO framework that includes an Orthogonal low-rank adapter (LoRA) for…
Peer Reviews
Decision·ICLR 2025 Poster
1. This paper is well-written and motivated by real-world scenarios that require unlearning. 2. The design choices for the proposed unlearning pipeline are justified through ablation studies.
The authors partially motivate machine unlearning as “ a representative approach for model safety and security by removing the influence of undesired data on the target model.” I very much agree with the assertion. However, the evaluations of the proposed methods mainly focus on unlearning knowledge instead of unsafe behaviors. The usefulness of the proposed method could benefit from additional evaluations against safety-oriented unlearning benchmarks such as WMDP [1]. [1] Li, N., Pan, A., Gopa
1. The authors solve the critical challenge of LLM unlearning by getting rid of the access to the retained data. The design of orthogonal LoRA demonstrates significant improvement in evaluation. 2. The authors conduct extensive experiments to evaluate the effectiveness of the proposed O^3 method.
1. During the inference, each testing instance x will be fed into all OOD detector backbones. This might limit the method's scalability when the unlearning requests increase due to the higher computational cost. 2. The experiments focus on the QA datasets, where each query only contains a single knowledge entity to be unlearned. The authors might need to evaluate the framework under more challenging and realistic settings, wherein for each query, there might be multiple knowledge entities to b
This paper addresses the LLM unlearning from the continual unlearning perspective. This unlearning process does not need the retained data.
The proposed LLM unlearn methods LORA and OOD detector does not exactly unlearn the knowledge from the LLMs. They are just like two modules externally mounted outside the LLM and block the input and output of the LLMs related to the unlearn targets.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsAdapter
