Lazy Layers to Make Fine-Tuned Diffusion Models More Traceable
Haozhe Liu, Wentian Zhang, Bing Li, Bernard Ghanem, J\"urgen, Schmidhuber

TL;DR
This paper introduces a novel backdoor watermarking method for diffusion models that remains robust after fine-tuning by embedding identifiers into specific 'busy' layers and feature space, enhancing traceability and safety regulation.
Contribution
The paper proposes the AIAO strategy and feature space embedding to improve watermark robustness against fine-tuning, addressing vulnerabilities of traditional backdoor watermarks.
Findings
Watermarks remain above 90% verification rate after fine-tuning.
Traditional methods' verification rates drop from ~90% to ~70%.
AIAO method is effective across multiple datasets.
Abstract
Foundational generative models should be traceable to protect their owners and facilitate safety regulation. To achieve this, traditional approaches embed identifiers based on supervisory trigger-response signals, which are commonly known as backdoor watermarks. They are prone to failure when the model is fine-tuned with nontrigger data. Our experiments show that this vulnerability is due to energetic changes in only a few 'busy' layers during fine-tuning. This yields a novel arbitrary-in-arbitrary-out (AIAO) strategy that makes watermarks resilient to fine-tuning-based removal. The trigger-response pairs of AIAO samples across various neural network depths can be used to construct watermarked subpaths, employing Monte Carlo sampling to achieve stable verification results. In addition, unlike the existing methods of designing a backdoor for the input/output space of diffusion models, in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Mathematical Modeling in Engineering
MethodsDiffusion
