Towards Reliable Verification of Unauthorized Data Usage in Personalized   Text-to-Image Diffusion Models

Boheng Li; Yanhao Wei; Yankai Fu; Zhenting Wang; Yiming Li; Jie Zhang,; Run Wang; Tianwei Zhang

arXiv:2410.10437·cs.CY·October 15, 2024

Towards Reliable Verification of Unauthorized Data Usage in Personalized Text-to-Image Diffusion Models

Boheng Li, Yanhao Wei, Yankai Fu, Zhenting Wang, Yiming Li, Jie Zhang,, Run Wang, Tianwei Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces SIREN, a novel method for reliably verifying unauthorized data usage in personalized text-to-image diffusion models by embedding and detecting learnable watermarks, addressing limitations of existing coatings.

Contribution

SIREN optimizes data coatings to be learnable by personalized models and employs perceptual constraints and hypothesis testing for robust verification, advancing data traceability in generative AI.

Findings

01

SIREN significantly improves watermark learnability in personalized models.

02

The method achieves high detection accuracy across diverse datasets and models.

03

SIREN remains effective against potential countermeasures.

Abstract

Text-to-image diffusion models are pushing the boundaries of what generative AI can achieve in our lives. Beyond their ability to generate general images, new personalization techniques have been proposed to customize the pre-trained base models for crafting images with specific themes or styles. Such a lightweight solution, enabling AI practitioners and developers to easily build their own personalized models, also poses a new concern regarding whether the personalized models are trained from unauthorized data. A promising solution is to proactively enable data traceability in generative models, where data owners embed external coatings (e.g., image watermarks or backdoor triggers) onto the datasets before releasing. Later the models trained over such datasets will also learn the coatings and unconsciously reproduce them in the generated mimicries, which can be extracted and used as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

antigonerandy/siren
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Compression Techniques

MethodsDiffusion · Balanced Selection · Sparse Evolutionary Training