Auditing Data Provenance in Real-world Text-to-Image Diffusion Models for Privacy and Copyright Protection
Jie Zhu, Leye Wang

TL;DR
This paper introduces a black-box auditing framework called FSCA for verifying data provenance in text-to-image diffusion models, enhancing privacy and copyright protection without needing internal model access.
Contribution
The paper presents a novel black-box auditing method that leverages semantic connections within diffusion models, outperforming existing approaches in real-world scenarios.
Findings
FSCA surpasses state-of-the-art baselines across various metrics.
Achieves up to 90% user-level accuracy with only 10 samples per user.
Effective in real-world datasets like LAION-mi and COCO.
Abstract
Text-to-image diffusion model since its propose has significantly influenced the content creation due to its impressive generation capability. However, this capability depends on large-scale text-image datasets gathered from web platforms like social media, posing substantial challenges in copyright compliance and personal privacy leakage. Though there are some efforts devoted to explore approaches for auditing data provenance in text-to-image diffusion models, existing work has unrealistic assumptions that can obtain model internal knowledge, e.g., intermediate results, or the evaluation is not reliable. To fill this gap, we propose a completely black-box auditing framework called Feature Semantic Consistency-based Auditing (FSCA). It utilizes two types of semantic connections within the text-to-image diffusion model for auditing, eliminating the need for access to internal knowledge.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Research Data Management Practices
