Rethinking Memorization Measures and their Implications in Large Language Models
Bishwamittra Ghosh, Soumi Das, Qinyuan Wu, Mohammad Aflah Khan, Krishna P. Gummadi, Evimaria Terzi, Deepak Garg

TL;DR
This paper critically re-evaluates memorization measures in large language models, revealing that optimal learning cannot fully eliminate memorization, and that different measures often disagree on what constitutes memorization, impacting privacy assessments.
Contribution
It introduces a new measure called contextual memorization, compares it with existing measures, and analyzes their implications for privacy and learning in large language models.
Findings
Memorization measures disagree on string memorization rankings.
Optimal learning cannot completely prevent partial memorization.
Improved learning reduces contextual and counterfactual memorization but increases recollection-based memorization.
Abstract
Concerned with privacy threats, memorization in LLMs is often seen as undesirable, specifically for learning. In this paper, we study whether memorization can be avoided when optimally learning a language, and whether the privacy threat posed by memorization is exaggerated or not. To this end, we re-examine existing privacy-focused measures of memorization, namely recollection-based and counterfactual memorization, along with a newly proposed contextual memorization. Relating memorization to local over-fitting during learning, contextual memorization aims to disentangle memorization from the contextual learning ability of LLMs. Informally, a string is contextually memorized if its recollection due to training exceeds the optimal contextual recollection, a learned threshold denoting the best contextual learning without training. Conceptually, contextual recollection avoids the fallacy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
