Rethinking LLM Memorization through the Lens of Adversarial Compression
Avi Schwarzschild, Zhili Feng, Pratyush Maini, Zachary C., Lipton, J. Zico Kolter

TL;DR
This paper introduces the Adversarial Compression Ratio (ACR), a new metric to assess whether large language models memorize training data by measuring if training strings can be compressed via adversarial prompts, aiding legal and ethical evaluations.
Contribution
The paper proposes the ACR metric as a novel, adversarial approach to quantify memorization in LLMs, addressing limitations of previous methods and enabling practical, low-cost assessments.
Findings
ACR effectively measures memorization in LLMs.
ACR provides a flexible, adversarial perspective on data memorization.
The metric can be used for legal compliance and unlearning monitoring.
Abstract
Large language models (LLMs) trained on web-scale datasets raise substantial concerns regarding permissible data usage. One major question is whether these models "memorize" all their training data or they integrate many data sources in some way more akin to how a human would learn and synthesize information. The answer hinges, to a large degree, on how we define memorization. In this work, we propose the Adversarial Compression Ratio (ACR) as a metric for assessing memorization in LLMs. A given string from the training data is considered memorized if it can be elicited by a prompt (much) shorter than the string itself -- in other words, if these strings can be "compressed" with the model by computing adversarial prompts of fewer tokens. The ACR overcomes the limitations of existing notions of memorization by (i) offering an adversarial view of measuring memorization, especially for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
