The Landscape of Memorization in LLMs: Mechanisms, Measurement, and Mitigation
Alexander Xiong, Xuandong Zhao, Aneesh Pappu, Dawn Song

TL;DR
This paper reviews how large language models memorize data, examines detection and mitigation techniques, and discusses the implications for privacy and model utility, highlighting future research directions.
Contribution
It provides a comprehensive synthesis of recent research on LLM memorization, including mechanisms, measurement methods, and mitigation strategies, with an emphasis on practical and ethical considerations.
Findings
Training data duplication increases memorization
Detection methods like membership inference are effective
Mitigation strategies include data cleaning and differential privacy
Abstract
Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide range of tasks, yet they also exhibit memorization of their training data. This phenomenon raises critical questions about model behavior, privacy risks, and the boundary between learning and memorization. Addressing these concerns, this paper synthesizes recent studies and investigates the landscape of memorization, the factors influencing it, and methods for its detection and mitigation. We explore key drivers, including training data duplication, training dynamics, and fine-tuning procedures that influence data memorization. In addition, we examine methodologies such as prefix-based extraction, membership inference, and adversarial prompting, assessing their effectiveness in detecting and measuring memorized content. Beyond technical analysis, we also explore the broader implications of memorization,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Ethics and Social Impacts of AI · Artificial Intelligence in Healthcare and Education
