Backdooring Bias ($B^2$) into Stable Diffusion Models
Ali Naseh, Jaechul Roh, Eugene Bagdasarian, Amir Houmansadr

TL;DR
This paper reveals a low-cost backdoor attack method that injects biases into stable diffusion models via natural textual triggers, raising concerns about malicious manipulation of generated images and highlighting detection challenges.
Contribution
The study introduces a novel backdoor attack technique on diffusion models using natural language triggers, demonstrating its feasibility and low cost through extensive experiments.
Findings
Backdoor biases can be injected with minimal data and cost.
Attacks maintain high text-image alignment and are hard to detect.
Model utility remains unaffected in the absence of triggers.
Abstract
Recent advances in large text-conditional diffusion models have revolutionized image generation by enabling users to create realistic, high-quality images from textual prompts, significantly enhancing artistic creation and visual communication. However, these advancements also introduce an underexplored attack opportunity: the possibility of inducing biases by an adversary into the generated images for malicious intentions, e.g., to influence public opinion and spread propaganda. In this paper, we study an attack vector that allows an adversary to inject arbitrary bias into a target model. The attack leverages low-cost backdooring techniques using a targeted set of natural textual triggers embedded within a small number of malicious data samples produced with public generative models. An adversary could pick common sequences of words that can then be inadvertently activated by benign…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · Handwritten Text Recognition Techniques
MethodsDiffusion
