When Image Generation Goes Wrong: A Safety Analysis of Stable Diffusion Models
Matthias Schneider, Thilo Hagendorff

TL;DR
This paper evaluates the safety of popular Stable Diffusion models, revealing they generate harmful and biased images without safety measures, raising concerns about their responsible use.
Contribution
It provides a comprehensive safety analysis of ten Stable Diffusion models, highlighting their lack of safety mechanisms and exposing biases in generated content.
Findings
Models generate harmful content in response to unsafe prompts.
Models exhibit racial biases, especially portraying Black individuals in violent scenarios.
No safety or refusal mechanisms are present in the evaluated models.
Abstract
Text-to-image models are increasingly popular and impactful, yet concerns regarding their safety and fairness remain. This study investigates the ability of ten popular Stable Diffusion models to generate harmful images, including NSFW, violent, and personally sensitive material. We demonstrate that these models respond to harmful prompts by generating inappropriate content, which frequently displays troubling biases, such as the disproportionate portrayal of Black individuals in violent contexts. Our findings demonstrate a complete lack of any refusal behavior or safety measures in the models observed. We emphasize the importance of addressing this issue as image generation technologies continue to become more accessible and incorporated into everyday applications.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
