TL;DR
Gungnir introduces a stealthy backdoor attack on diffusion models using style-based triggers that evade detection and remain effective even after defenses.
Contribution
It presents a novel style-based backdoor attack method for diffusion models, expanding the threat landscape beyond traditional trigger types.
Findings
Gungnir bypasses state-of-the-art defenses with low detection rates.
The attack remains effective after fine-tuning-based purification.
Style-based triggers are highly stealthy and perceptually indistinguishable from clean images.
Abstract
Diffusion Models (DMs) have achieved remarkable success in image generation, yet recent studies reveal their vulnerability to backdoor attacks, where adversaries manipulate outputs via covert triggers embedded in inputs. Existing defenses, such as backdoor detection and trigger inversion, are largely effective because prior attacks rely on limited input spaces and low-dimensional triggers that are visually conspicuous or easily captured by neural detectors. To broaden the threat landscape, we propose Gungnir, a novel backdoor attack that activates malicious behaviors through style-based triggers embedded in input images. Unlike explicit visual patches or textual cues, stylistic features serve as stealthy, high-level triggers. We introduce Reconstructing-Adversarial Noise (RAN) and Short-Term Timesteps-Retention (STTR) to preserve trigger-consistent diffusion dynamics in image-to-image…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
