Leveraging Programmatically Generated Synthetic Data for Differentially   Private Diffusion Training

Yujin Choi; Jinseong Park; Junyoung Byun; Jaewook Lee

arXiv:2412.09842·cs.LG·December 16, 2024

Leveraging Programmatically Generated Synthetic Data for Differentially Private Diffusion Training

Yujin Choi, Jinseong Park, Junyoung Byun, Jaewook Lee

PDF

1 Repo

TL;DR

This paper introduces DP-SynGen, a method that uses synthetic data at specific stages of diffusion models to improve generative quality while reducing privacy costs.

Contribution

It identifies stages in diffusion models where synthetic data can replace private data, enhancing privacy and generative performance.

Findings

01

Improved quality of generated images with synthetic data.

02

Reduced privacy budget by replacing certain training stages.

03

Validated effectiveness through theoretical and empirical analysis.

Abstract

Programmatically generated synthetic data has been used in differential private training for classification to enhance performance without privacy leakage. However, as the synthetic data is generated from a random process, the distribution of real data and the synthetic data are distinguishable and difficult to transfer. Therefore, the model trained with the synthetic data generates unrealistic random images, raising challenges to adapt the synthetic data for generative models. In this work, we propose DP-SynGen, which leverages programmatically generated synthetic data in diffusion models to address this challenge. By exploiting the three stages of diffusion models(coarse, context, and cleaning) we identify stages where synthetic data can be effectively utilized. We theoretically and empirically verified that cleaning and coarse stages can be trained without private data, replacing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

uzn36/dp-syngen
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsDiffusion