Mitigating Memorization In Language Models
Mansi Sakarvadia, Aswathy Ajith, Arham Khan, Nathaniel Hudson, Caleb, Geniesse, Kyle Chard, Yaoqing Yang, Ian Foster, Michael W. Mahoney

TL;DR
This paper explores various methods to reduce memorization in language models, introducing new unlearning techniques and a small benchmark suite, demonstrating that unlearning is effective and efficient for mitigating memorization.
Contribution
The paper introduces five new unlearning methods, a TinyMem benchmark suite, and provides a comprehensive comparison of mitigation techniques, highlighting unlearning's advantages.
Findings
Unlearning methods are faster and more effective than regularizer and fine-tuning approaches.
BalancedSubnet outperforms other methods in removing memorized data.
Regularizer-based methods are slow and ineffective at reducing memorization.
Abstract
Language models (LMs) can "memorize" information, i.e., encode training data in their weights in such a way that inference-time queries can lead to verbatim regurgitation of that data. This ability to extract training data can be problematic, for example, when data are private or sensitive. In this work, we investigate methods to mitigate memorization: three regularizer-based, three finetuning-based, and eleven machine unlearning-based methods, with five of the latter being new methods that we introduce. We also introduce TinyMem, a suite of small, computationally-efficient LMs for the rapid development and evaluation of memorization-mitigation methods. We demonstrate that the mitigation methods that we develop using TinyMem can successfully be applied to production-grade LMs, and we determine via experiment that: regularizer-based mitigation methods are slow and ineffective at curbing…
Peer Reviews
Decision·ICLR 2025 Spotlight
The authors present a comprehensive comparison of 17 methods for reducing memorization. Applying the TinyMem test bench to larger models (Pythia) demonstrates the generalizability of the result
No major methodological weaknesses, however I find this study to be a straightforward comparison of several (existing and new) techniques without any particular novel insights. While this type of work is useful, the authors should extend the work by investigating why the best method works well or extracting practical insights from the results (beside stating that the method is most effective).
1. This paper focuses on the critical problems, namely, the challenges of developing and evaluating memory elimination methods directly on large models. 2. This paper innovatively proposes a series of small GPT2-style models (TinyMem), which is used to conduct memorization injection and memorization mitigation experiments in a quick and computationally efficient way. 3. This paper proposes five new unlearning-based methods, and among them, BalancedSubnet outperforms other methods at removing me
1. I am afraid that the evaluation results of different methods on TinyMem cannot truly reflect the performance of these methods in the memorization elimination of LLMs. On the one hand, the results in Table 1 and Table 2 show that some unlearning-based methods do not perform consistently in different types of tasks and memorization elimination. So how to effectively compare the comprehensive performance of different methods? On the other hand, the types of memorization within LLMs that need to
1. This paper first introduces TinyMem, a new suite of lightweight models specifically designed for rapid testing of memorization mitigation strategies. Additionally, the proposal of five new unlearning-based methods, including the innovative BalancedSubnet approach, adds contribution to the field. This work also highlights a novel way of applying unlearning-based techniques to real-world, production-grade models, bridging the gap between research and application. 2. Extensive experiments invol
This paper seems to be missing some important unlearning baseline methods, e.g., Gradient Ascent + Descent and Gradient Ascent + KL divergence [1]. [1] Jin Yao, Eli Chien, Minxin Du, Xinyao Niu, Tianhao Wang, Zezhou Cheng, and Xiang Yue. 2024. Machine Unlearning of Pre-trained Large Language Models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8403–8419, Bangkok, Thailand. Association for Computational Linguistics.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling
