Backdooring Bias ($B^2$) into Stable Diffusion Models

Ali Naseh; Jaechul Roh; Eugene Bagdasarian; Amir Houmansadr

arXiv:2406.15213·cs.LG·July 8, 2025

Backdooring Bias ($B^2$) into Stable Diffusion Models

Ali Naseh, Jaechul Roh, Eugene Bagdasarian, Amir Houmansadr

PDF

Open Access 1 Repo

TL;DR

This paper reveals a low-cost backdoor attack method that injects biases into stable diffusion models via natural textual triggers, raising concerns about malicious manipulation of generated images and highlighting detection challenges.

Contribution

The study introduces a novel backdoor attack technique on diffusion models using natural language triggers, demonstrating its feasibility and low cost through extensive experiments.

Findings

01

Backdoor biases can be injected with minimal data and cost.

02

Attacks maintain high text-image alignment and are hard to detect.

03

Model utility remains unaffected in the absence of triggers.

Abstract

Recent advances in large text-conditional diffusion models have revolutionized image generation by enabling users to create realistic, high-quality images from textual prompts, significantly enhancing artistic creation and visual communication. However, these advancements also introduce an underexplored attack opportunity: the possibility of inducing biases by an adversary into the generated images for malicious intentions, e.g., to influence public opinion and spread propaganda. In this paper, we study an attack vector that allows an adversary to inject arbitrary bias into a target model. The attack leverages low-cost backdooring techniques using a targeted set of natural textual triggers embedded within a small number of malicious data samples produced with public generative models. An adversary could pick common sequences of words that can then be inadvertently activated by benign…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jrohsc/backdororing_bias
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · Handwritten Text Recognition Techniques

MethodsDiffusion