BadBlocks: Lightweight and Stealthy Backdoor Threat in Text-to-Image Diffusion Models
Yu Pan, Jiahao Chen, Wenjie Wang, Bingrong Dai, Junjun Yang

TL;DR
BadBlocks introduces a lightweight, stealthy backdoor attack method for text-to-image diffusion models that is highly efficient, effective, and capable of evading current defenses, lowering the barrier for malicious manipulation.
Contribution
It presents BadBlocks, a novel backdoor technique that contaminates specific UNet blocks with minimal computation, enabling stealthy attacks on diffusion models.
Findings
Achieves high attack success with minimal perceptual degradation.
Requires only 30% of the computation and 20% of GPU time of prior methods.
Effectively evades state-of-the-art defenses, especially attention-based detection.
Abstract
Diffusion models have recently achieved remarkable success in image generation, yet growing evidence shows their vulnerability to backdoor attacks, where adversaries implant covert triggers to manipulate outputs. While existing defenses can detect many such attacks via visual inspection and neural network-based analysis, we identify a more lightweight and stealthy threat, termed BadBlocks. BadBlocks selectively contaminates specific blocks within the UNet architecture while preserving the normal behavior of the remaining components. Compared with prior methods, it requires only about 30% of the computation and 20% of the GPU time, yet achieves high attack success rates with minimal perceptual degradation. Extensive experiments demonstrate that BadBlocks can effectively evade state-of-the-art defenses, particularly attention-based detection frameworks. Ablation studies further reveal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
