Invisible Backdoor Triggers in Image Editing Model via Deep Watermarking
Yu-Feng Chen, Tzuhsuan Huang, Pin-Yen Chiu, Jun-Cheng Chen

TL;DR
This paper introduces a novel backdoor attack method on image editing diffusion models that embeds invisible watermarks as triggers, enabling manipulation of outputs without visible alterations.
Contribution
It proposes using deep watermarking models to embed imperceptible triggers in training data, enabling effective backdoor attacks in image editing models.
Findings
Achieves promising attack success rates across different watermarking models.
Watermark characteristics effectively serve as backdoor triggers.
Method remains imperceptible to human observers.
Abstract
Diffusion models have achieved remarkable progress in both image generation and editing. However, recent studies have revealed their vulnerability to backdoor attacks, in which specific patterns embedded in the input can manipulate the model's behavior. Most existing research in this area has proposed attack frameworks focused on the image generation pipeline, leaving backdoor attacks in image editing relatively unexplored. Among the few studies targeting image editing, most utilize visible triggers, which are impractical because they introduce noticeable alterations to the input image before editing. In this paper, we propose a novel attack framework that embeds invisible triggers into the image editing process via poisoned training data. We leverage off-the-shelf deep watermarking models to encode imperceptible watermarks as backdoor triggers. Our goal is to make the model produce the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Steganography and Watermarking Techniques · Generative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning
