Rethinking LLM Memorization through the Lens of Adversarial Compression

Avi Schwarzschild; Zhili Feng; Pratyush Maini; Zachary C.; Lipton; J. Zico Kolter

arXiv:2404.15146·cs.LG·November 13, 2024

Rethinking LLM Memorization through the Lens of Adversarial Compression

Avi Schwarzschild, Zhili Feng, Pratyush Maini, Zachary C., Lipton, J. Zico Kolter

PDF

Open Access

TL;DR

This paper introduces the Adversarial Compression Ratio (ACR), a new metric to assess whether large language models memorize training data by measuring if training strings can be compressed via adversarial prompts, aiding legal and ethical evaluations.

Contribution

The paper proposes the ACR metric as a novel, adversarial approach to quantify memorization in LLMs, addressing limitations of previous methods and enabling practical, low-cost assessments.

Findings

01

ACR effectively measures memorization in LLMs.

02

ACR provides a flexible, adversarial perspective on data memorization.

03

The metric can be used for legal compliance and unlearning monitoring.

Abstract

Large language models (LLMs) trained on web-scale datasets raise substantial concerns regarding permissible data usage. One major question is whether these models "memorize" all their training data or they integrate many data sources in some way more akin to how a human would learn and synthesize information. The answer hinges, to a large degree, on how we define memorization. In this work, we propose the Adversarial Compression Ratio (ACR) as a metric for assessing memorization in LLMs. A given string from the training data is considered memorized if it can be elicited by a prompt (much) shorter than the string itself -- in other words, if these strings can be "compressed" with the model by computing adversarial prompts of fewer tokens. The ACR overcomes the limitations of existing notions of memorization by (i) offering an adversarial view of measuring memorization, especially for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques