The Landscape of Memorization in LLMs: Mechanisms, Measurement, and Mitigation

Alexander Xiong; Xuandong Zhao; Aneesh Pappu; Dawn Song

arXiv:2507.05578·cs.LG·December 15, 2025

The Landscape of Memorization in LLMs: Mechanisms, Measurement, and Mitigation

Alexander Xiong, Xuandong Zhao, Aneesh Pappu, Dawn Song

PDF

Open Access

TL;DR

This paper reviews how large language models memorize data, examines detection and mitigation techniques, and discusses the implications for privacy and model utility, highlighting future research directions.

Contribution

It provides a comprehensive synthesis of recent research on LLM memorization, including mechanisms, measurement methods, and mitigation strategies, with an emphasis on practical and ethical considerations.

Findings

01

Training data duplication increases memorization

02

Detection methods like membership inference are effective

03

Mitigation strategies include data cleaning and differential privacy

Abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide range of tasks, yet they also exhibit memorization of their training data. This phenomenon raises critical questions about model behavior, privacy risks, and the boundary between learning and memorization. Addressing these concerns, this paper synthesizes recent studies and investigates the landscape of memorization, the factors influencing it, and methods for its detection and mitigation. We explore key drivers, including training data duplication, training dynamics, and fine-tuning procedures that influence data memorization. In addition, we examine methodologies such as prefix-based extraction, membership inference, and adversarial prompting, assessing their effectiveness in detecting and measuring memorized content. Beyond technical analysis, we also explore the broader implications of memorization,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Ethics and Social Impacts of AI · Artificial Intelligence in Healthcare and Education