Demystifying Verbatim Memorization in Large Language Models
Jing Huang, Diyi Yang, Christopher Potts

TL;DR
This paper investigates how large language models memorize sequences verbatim, revealing that memorization depends on repetition, model checkpoints, and high-level features, and that current unlearning methods are ineffective without harming model performance.
Contribution
The study introduces a controlled framework for analyzing verbatim memorization in LLMs and provides new insights into its mechanisms and challenges for unlearning.
Findings
Verbatim memorization requires non-trivial repetition.
Later checkpoints are more prone to memorization.
Unlearning methods often degrade model performance.
Abstract
Large Language Models (LLMs) frequently memorize long sequences verbatim, often with serious legal and privacy implications. Much prior work has studied such verbatim memorization using observational data. To complement such work, we develop a framework to study verbatim memorization in a controlled setting by continuing pre-training from Pythia checkpoints with injected sequences. We find that (1) non-trivial amounts of repetition are necessary for verbatim memorization to happen; (2) later (and presumably better) checkpoints are more likely to verbatim memorize sequences, even for out-of-distribution sequences; (3) the generation of memorized sequences is triggered by distributed model states that encode high-level features and makes important use of general language modeling capabilities. Guided by these insights, we develop stress tests to evaluate unlearning methods and find they…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
MethodsPythia
