Learn while Unlearn: An Iterative Unlearning Framework for Generative Language Models
Haoyu Tang, Ye Liu, Xi Zhao, Xukai Liu, Yanghai Zhang, Kai Zhang, Xiaofang Zhou, Enhong Chen

TL;DR
This paper proposes an iterative unlearning framework for generative language models that effectively removes sensitive information while preserving model performance, addressing privacy concerns and data unavailability issues.
Contribution
The paper introduces the ICU framework with three modules for targeted unlearning, preserving model capabilities, and iterative refinement, advancing privacy-preserving NLP techniques.
Findings
Effective removal of sensitive data demonstrated
Maintains overall model performance
Applicable without access to original training data
Abstract
Recent advances in machine learning, particularly in Natural Language Processing (NLP), have produced powerful models trained on vast datasets. However, these models risk leaking sensitive information, raising privacy concerns. In response, regulatory measures such as the European Union's General Data Protection Regulation (GDPR) have driven increasing interest in Machine Unlearning techniques, which enable models to selectively forget specific data entries. Early unlearning approaches primarily relied on pre-processing methods, while more recent research has shifted towards training-based solutions. Despite their effectiveness, a key limitation persists: most methods require access to original training data, which is often unavailable. Additionally, directly applying unlearning techniques bears the cost of undermining the model's expressive capabilities. To address these challenges, we…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
1. The retrieved samples and introduced stopping criterion help with the balance between unlearning and keeping original capabilities. 2. They did experiments to validate the effectiveness of ICU, together with studies on parameter sensitivity.
1. It seems that ICU still require access to a large relevant set (where the relevant data could be retrieved). However, this might not be always available especially when it comes to unlearn privacy-related data. In such cases, it is hard/impossible to retrieve large amount of similar data, which makes ICU less generalizable to more unlearned settings. Recent approach that utilize representation unlearning seems to avoid such data issues ([1]) and the authors seem not discuss with them as well
1. The paper is well written and easy to follow. 2. The method proposed in the paper has been empirically validated, and the ablation experiments demonstrate the effectiveness of each component.
1. The experiments in the paper lack comparisons with a substantial number of prior methods, including [1], [2], and [3]. 2. The authors should consider conducting experiments on larger and more realistic benchmarks to test the method's effectiveness in erasing specific knowledge in practical applications. For instance, the WMDP benchmark [4] may serve as a viable testbed for such method. [1]. Pawelczyk M, Neel S, Lakkaraju H. In-Context Unlearning: Language Models as Few-Shot Unlearners[C]//Fo
The proposed method is innovative, particularly in its use of KL-divergence to align the output distribution of the unlearned model with the original model on the paired dataset $D_{lrn}$, helping to maintain the coherence and meaningfulness of generated text. The paper is well-written, clear, and structured, making the methodology easy to follow. Figure 2 provides a clear, comprehensive overview of the approach. Additionally, the paper addresses a well-motivated problem, enhancing its relevance
1. Epoch is featured prominently in Table 1 of the main paper but the authors offers no explanation of the relevance to their proposed method. what is considered better here? how should the reader interpret these numbers? If my understanding of the `epoch` metric is correct from the prior work cited by the authors dubbed KUMPR, it would appear that the comparison with this prior work is conducted unfairly. Table 1 from this paper reported Epochs 3.2, 2.0, and 6.6 for KUMPR but upon inspecting th
- The proposed ICU framework is easy to adopt, and perform well compared to other unlearning techniques. - Extensive experiments across different model sizes and datasets. - The paper is generally well-written
## Major - Information entropy measures unpredictability, or in other word, uncertainty. However, entropy does not correspond to "information content". It may instead indicate that the generation is not fluent. - Provide citation(s) to support the claim that higher information entropy corresponds to higher "information content". - L369: How are these thresholds selected? - Lack of explanation into why the GPT evaluation does not seem to strongly correlate with the other evaluation metrics. ##
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
MethodsContrastive Learning
