Leveraging Model Guidance to Extract Training Data from Personalized Diffusion Models

Xiaoyu Wu; Jiaru Zhang; Zhiwei Steven Wu

arXiv:2410.03039·cs.CV·September 29, 2025

Leveraging Model Guidance to Extract Training Data from Personalized Diffusion Models

Xiaoyu Wu, Jiaru Zhang, Zhiwei Steven Wu

PDF

Open Access 1 Video

TL;DR

This paper introduces FineXtract, a method to extract training data from personalized diffusion models, revealing potential data leakage and copyright infringement risks associated with fine-tuned models shared online.

Contribution

We propose a novel framework that approximates fine-tuning as a distribution shift and guides image generation to extract training data from diffusion models.

Findings

01

Extracted about 20% of fine-tuning data in experiments

02

Validated on datasets like WikiArt and DreamBooth

03

Effective in real-world online checkpoints

Abstract

Diffusion Models (DMs) have become powerful image generation tools, especially for few-shot fine-tuning where a pretrained DM is fine-tuned on a small image set to capture specific styles or objects. Many people upload these personalized checkpoints online, fostering communities such as Civitai and HuggingFace. However, model owners may overlook the data leakage risks when releasing fine-tuned checkpoints. Moreover, concerns regarding copyright violations arise when unauthorized data is used during fine-tuning. In this paper, we ask: "Can training data be extracted from these fine-tuned DMs shared online?" A successful extraction would present not only data leakage threats but also offer tangible evidence of copyright infringement. To answer this, we propose FineXtract, a framework for extracting fine-tuning data. Our method approximates fine-tuning as a gradual shift in the model's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Leveraging Model Guidance to Extract Training Data from Personalized Diffusion Models· slideslive

Taxonomy

TopicsStatistical Methods and Inference

MethodsSparse Evolutionary Training