DiffGuard: Text-Based Safety Checker for Diffusion Models
Massine El Khader, Elias Al Bouzidi, Abdellah Oumida, Mohammed Sbaihi,, Eliott Binard, Jean-Philippe Poli, Wassila Ouerdane, Boussad Addad, Katarzyna, Kapusta

TL;DR
This paper introduces DiffGuard, a novel text-based safety filter for diffusion models that significantly improves the filtering of unsafe content, addressing limitations of current ethical filters in open-source image generation models.
Contribution
The paper presents a new safety filter for diffusion models that outperforms existing solutions by over 14%, enhancing ethical content filtering in AI image generation.
Findings
DiffGuard surpasses existing filters by over 14% in performance.
Current ethical filters have notable limitations in preventing unsafe content.
DiffGuard effectively addresses misuse of AI-generated images in information warfare.
Abstract
Recent advances in Diffusion Models have enabled the generation of images from text, with powerful closed-source models like DALL-E and Midjourney leading the way. However, open-source alternatives, such as StabilityAI's Stable Diffusion, offer comparable capabilities. These open-source models, hosted on Hugging Face, come equipped with ethical filter protections designed to prevent the generation of explicit images. This paper reveals first their limitations and then presents a novel text-based safety filter that outperforms existing solutions. Our research is driven by the critical need to address the misuse of AI-generated content, especially in the context of information warfare. DiffGuard enhances filtering efficacy, achieving a performance that surpasses the best existing filters by over 14%.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Reliability and Analysis Research · Formal Methods in Verification · Model-Driven Software Engineering Techniques
MethodsDiffusion
