CDI: Copyrighted Data Identification in Diffusion Models
Jan Dubi\'nski, Antoni Kowalczuk, Franziska Boenisch, Adam Dziedzic

TL;DR
This paper introduces CDI, a novel framework that enables data owners to reliably determine if their datasets were used to train diffusion models, addressing copyright concerns with high confidence even with limited data points.
Contribution
CDI leverages dataset inference techniques, aggregating signals from multiple data points and applying statistical tests to improve detection accuracy over existing methods.
Findings
CDI achieves over 99% confidence with as few as 70 data points.
Existing MIAs are insufficient for reliable membership inference in large diffusion models.
CDI effectively detects copyrighted data usage, aiding legal and ethical data management.
Abstract
Diffusion Models (DMs) benefit from large and diverse datasets for their training. Since this data is often scraped from the Internet without permission from the data owners, this raises concerns about copyright and intellectual property protections. While (illicit) use of data is easily detected for training samples perfectly re-created by a DM at inference time, it is much harder for data owners to verify if their data was used for training when the outputs from the suspect DM are not close replicas. Conceptually, membership inference attacks (MIAs), which detect if a given data point was used during training, present themselves as a suitable tool to address this challenge. However, we demonstrate that existing MIAs are not strong enough to reliably determine the membership of individual images in large, state-of-the-art DMs. To overcome this limitation, we propose CDI, a framework…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArt History and Market Analysis
