FlowGuard: Towards Lightweight In-Generation Safety Detection for Diffusion Models via Linear Latent Decoding
Jinghan Yang, Yihe Fan, Xudong Pan, Min Yang

TL;DR
FlowGuard introduces a novel in-generation safety detection method for diffusion models that inspects intermediate denoising steps, improving safety and efficiency over existing approaches.
Contribution
The paper proposes a linear latent decoding approach and curriculum learning for early unsafe content detection during diffusion, reducing computational costs and enhancing safety.
Findings
FlowGuard outperforms existing methods by over 30% in F1 score.
It reduces peak GPU memory demand by over 97%.
Projection time decreases from 8.1 seconds to 0.2 seconds.
Abstract
Diffusion-based image generation models have advanced rapidly but pose a safety risk due to their potential to generate Not-Safe-For-Work (NSFW) content. Existing NSFW detection methods mainly operate either before or after image generation. Pre-generation methods rely on text prompts and struggle with the gap between prompt safety and image safety. Post-generation methods apply classifiers to final outputs, but they are poorly suited to intermediate noisy images. To address this, we introduce FlowGuard, a cross-model in-generation detection framework that inspects intermediate denoising steps. This is particularly challenging in latent diffusion, where early-stage noise obscures visual signals. FlowGuard employs a novel linear approximation for latent decoding and leverages a curriculum learning approach to stabilize training. By detecting unsafe content early, FlowGuard reduces…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
