When Image Generation Goes Wrong: A Safety Analysis of Stable Diffusion Models

Matthias Schneider; Thilo Hagendorff

arXiv:2411.15516·cs.CY·August 29, 2025

When Image Generation Goes Wrong: A Safety Analysis of Stable Diffusion Models

Matthias Schneider, Thilo Hagendorff

PDF

Open Access

TL;DR

This paper evaluates the safety of popular Stable Diffusion models, revealing they generate harmful and biased images without safety measures, raising concerns about their responsible use.

Contribution

It provides a comprehensive safety analysis of ten Stable Diffusion models, highlighting their lack of safety mechanisms and exposing biases in generated content.

Findings

01

Models generate harmful content in response to unsafe prompts.

02

Models exhibit racial biases, especially portraying Black individuals in violent scenarios.

03

No safety or refusal mechanisms are present in the evaluated models.

Abstract

Text-to-image models are increasingly popular and impactful, yet concerns regarding their safety and fairness remain. This study investigates the ability of ten popular Stable Diffusion models to generate harmful images, including NSFW, violent, and personally sensitive material. We demonstrate that these models respond to harmful prompts by generating inappropriate content, which frequently displays troubling biases, such as the disproportionate portrayal of Black individuals in violent contexts. Our findings demonstrate a complete lack of any refusal behavior or safety measures in the models observed. We emphasize the importance of addressing this issue as image generation technologies continue to become more accessible and incorporated into everyday applications.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning