Extracting Training Data from Diffusion Models

Nicholas Carlini; Jamie Hayes; Milad Nasr; Matthew Jagielski; Vikash; Sehwag; Florian Tram\`er; Borja Balle; Daphne Ippolito; Eric Wallace

arXiv:2301.13188·cs.CR·January 31, 2023·96 cites

Extracting Training Data from Diffusion Models

Nicholas Carlini, Jamie Hayes, Milad Nasr, Matthew Jagielski, Vikash, Sehwag, Florian Tram\`er, Borja Balle, Daphne Ippolito, Eric Wallace

PDF

Open Access 1 Repo 4 Datasets

TL;DR

This paper demonstrates that diffusion models memorize training images and can leak them at generation time, raising privacy concerns and highlighting the need for privacy-preserving training methods.

Contribution

It reveals the privacy vulnerabilities of diffusion models by extracting training data and analyzes how different training choices impact privacy risks.

Findings

01

Over a thousand training images extracted from diffusion models

02

Diffusion models are less private than GANs

03

Mitigating privacy risks requires new training approaches

Abstract

Image diffusion models such as DALL-E 2, Imagen, and Stable Diffusion have attracted significant attention due to their ability to generate high-quality synthetic images. In this work, we show that diffusion models memorize individual images from their training data and emit them at generation time. With a generate-and-filter pipeline, we extract over a thousand training examples from state-of-the-art models, ranging from photographs of individual people to trademarked company logos. We also train hundreds of diffusion models in various settings to analyze how different modeling and data decisions affect privacy. Overall, our results show that diffusion models are much less private than prior generative models such as GANs, and that mitigating these vulnerabilities may require new advances in privacy-preserving training.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vita-group/shake-to-leak
pytorch

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis

MethodsDiffusion