Extracting Training Data from Unconditional Diffusion Models

Yunhao Chen; Xingjun Ma; Difan Zou; Yu-Gang Jiang

arXiv:2406.12752·cs.CR·October 15, 2024

Extracting Training Data from Unconditional Diffusion Models

Yunhao Chen, Xingjun Ma, Difan Zou, Yu-Gang Jiang

PDF

Open Access

TL;DR

This paper develops a theoretical framework and new methods for extracting training data from diffusion models, revealing their memorization properties and improving data recovery techniques.

Contribution

It introduces a theoretical analysis of memorization in diffusion models and proposes SIDE, a novel data extraction method that outperforms previous approaches.

Findings

01

SIDE extracts data from unconditional diffusion models where prior methods fail

02

Theoretical analysis provides new insights into memorization in diffusion models

03

SIDE achieves over 50% higher effectiveness on CelebA dataset

Abstract

As diffusion probabilistic models (DPMs) are being employed as mainstream models for generative artificial intelligence (AI), the study of their memorization of the raw training data has attracted growing attention. Existing works in this direction aim to establish an understanding of whether or to what extent DPMs learn by memorization. Such an understanding is crucial for identifying potential risks of data leakage and copyright infringement in diffusion models and, more importantly, for more controllable generation and trustworthy application of Artificial Intelligence Generated Content (AIGC). While previous works have made important observations of when DPMs are prone to memorization, these findings are mostly empirical, and the developed data extraction methods only work for conditional diffusion models. In this work, we aim to establish a theoretical understanding of memorization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Processing Techniques · Intelligent Tutoring Systems and Adaptive Learning

MethodsDiffusion