Scalable Extraction of Training Data from (Production) Language Models

Milad Nasr; Nicholas Carlini; Jonathan Hayase; Matthew Jagielski; A.; Feder Cooper; Daphne Ippolito; Christopher A. Choquette-Choo; Eric Wallace,; Florian Tram\`er; Katherine Lee

arXiv:2311.17035·cs.LG·November 29, 2023·81 cites

Scalable Extraction of Training Data from (Production) Language Models

Milad Nasr, Nicholas Carlini, Jonathan Hayase, Matthew Jagielski, A., Feder Cooper, Daphne Ippolito, Christopher A. Choquette-Choo, Eric Wallace,, Florian Tram\`er, Katherine Lee

PDF

Open Access 3 Videos

TL;DR

This paper demonstrates that large language models, including closed and aligned ones like ChatGPT, can have their training data efficiently extracted through novel and existing attack methods, revealing persistent memorization.

Contribution

It introduces a new divergence attack to extract training data from aligned models and shows that current alignment does not prevent memorization.

Findings

01

Adversaries can extract gigabytes of training data from various models.

02

The divergence attack increases data extraction rate by 150x from ChatGPT.

03

Existing techniques suffice to attack unaligned models, but new methods are needed for aligned models.

Abstract

This paper studies extractable memorization: training data that an adversary can efficiently extract by querying a machine learning model without prior knowledge of the training dataset. We show an adversary can extract gigabytes of training data from open-source language models like Pythia or GPT-Neo, semi-open models like LLaMA or Falcon, and closed models like ChatGPT. Existing techniques from the literature suffice to attack unaligned models; in order to attack the aligned ChatGPT, we develop a new divergence attack that causes the model to diverge from its chatbot-style generations and emit training data at a rate 150x higher than when behaving properly. Our methods show practical attacks can recover far more data than previously thought, and reveal that current alignment techniques do not eliminate memorization.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Scalable Extraction of Training Data from (Production) Language Models (Paper Explained)· youtube

The Weird ChatGPT Hack That Leaked Training Data [Dr. Yannic Kilcher / Prof. Florian Tramer]· youtube

Scalable Extraction of Training Data from (Production) Language Models· youtube

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Explainable Artificial Intelligence (XAI)

MethodsPythia · GPT-Neo