Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models
Gowthami Somepalli, Vasu Singla, Micah Goldblum, Jonas Geiping, Tom, Goldstein

TL;DR
This paper investigates whether diffusion models generate original images or replicate training data, using image retrieval techniques to detect content copying across various datasets and models.
Contribution
It introduces image retrieval frameworks to identify content replication in diffusion models and analyzes how training set size influences copying behavior.
Findings
Diffusion models sometimes blatantly copy training data.
Training set size affects the rate of content replication.
Stable Diffusion can produce images that directly copy training samples.
Abstract
Cutting-edge diffusion models produce images with high quality and customizability, enabling them to be used for commercial art and graphic design purposes. But do diffusion models create unique works of art, or are they replicating content directly from their training sets? In this work, we study image retrieval frameworks that enable us to compare generated images with training samples and detect when content has been replicated. Applying our frameworks to diffusion models trained on multiple datasets including Oxford flowers, Celeb-A, ImageNet, and LAION, we discuss how factors such as training set size impact rates of content replication. We also identify cases where diffusion models, including the popular Stable Diffusion model, blatantly copy from their training data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗aipicasso/cool-japan-diffusion-for-learning-2-0model· 21 dl· ♡ 3821 dl♡ 38
- 🤗Crosstyan/BPModelmodel· 353 dl· ♡ 150353 dl♡ 150
- 🤗aipicasso/cool-japan-diffusion-2-1-0model· 55 dl· ♡ 6555 dl♡ 65
- 🤗aipicasso/cool-japan-diffusion-2-1-0-betamodel· 23 dl· ♡ 3023 dl♡ 30
- 🤗aipicasso/cool-japan-diffusion-2-1-1-betamodel· 32 dl· ♡ 1032 dl♡ 10
- 🤗aipicasso/cool-japan-diffusion-2-1-1model· 40 dl· ♡ 2040 dl♡ 20
- 🤗aipicasso/cool-japan-diffusion-2-1-1-1model· 9 dl· ♡ 39 dl♡ 3
- 🤗aipicasso/cool-japan-diffusion-2-1-2-betamodel· 36 dl· ♡ 236 dl♡ 2
- 🤗aipicasso/picasso-diffusion-1-1model· 63 dl· ♡ 3863 dl♡ 38
- 🤗aipicasso/cool-japan-diffusion-2-1-2model· 24 dl· ♡ 1524 dl♡ 15
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection · Aesthetic Perception and Analysis
MethodsDiffusion
