Extracting Memorized Training Data via Decomposition
Ellen Su, Anu Vellore, Amy Chang, Raffaele Mura, Blaine Nelson, Paul, Kassianik, Amin Karbasi

TL;DR
This paper presents a simple, query-based decomposition method to extract training data from large language models, revealing potential security and privacy vulnerabilities without modifying the models.
Contribution
It introduces a novel, generalizable technique for extracting training data from LLMs through instruction decomposition, highlighting security risks.
Findings
Successfully extracted verbatim sentences from news articles
Revealed that LLMs can reproduce source training data
Method does not require fine-tuning or model modification
Abstract
The widespread use of Large Language Models (LLMs) in society creates new information security challenges for developers, organizations, and end-users alike. LLMs are trained on large volumes of data, and their susceptibility to reveal the exact contents of the source training datasets poses security and safety risks. Although current alignment procedures restrict common risky behaviors, they do not completely prevent LLMs from leaking data. Prior work demonstrated that LLMs may be tricked into divulging training data by using out-of-distribution queries or adversarial techniques. In this paper, we demonstrate a simple, query-based decompositional method to extract news articles from two frontier LLMs. We use instruction decomposition techniques to incrementally extract fragments of training data. Out of 3723 New York Times articles, we extract at least one verbatim sentence from 73…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning
