FlowGuard: Towards Lightweight In-Generation Safety Detection for Diffusion Models via Linear Latent Decoding

Jinghan Yang; Yihe Fan; Xudong Pan; Min Yang

arXiv:2604.07879·cs.CV·April 10, 2026

FlowGuard: Towards Lightweight In-Generation Safety Detection for Diffusion Models via Linear Latent Decoding

Jinghan Yang, Yihe Fan, Xudong Pan, Min Yang

PDF

TL;DR

FlowGuard introduces a novel in-generation safety detection method for diffusion models that inspects intermediate denoising steps, improving safety and efficiency over existing approaches.

Contribution

The paper proposes a linear latent decoding approach and curriculum learning for early unsafe content detection during diffusion, reducing computational costs and enhancing safety.

Findings

01

FlowGuard outperforms existing methods by over 30% in F1 score.

02

It reduces peak GPU memory demand by over 97%.

03

Projection time decreases from 8.1 seconds to 0.2 seconds.

Abstract

Diffusion-based image generation models have advanced rapidly but pose a safety risk due to their potential to generate Not-Safe-For-Work (NSFW) content. Existing NSFW detection methods mainly operate either before or after image generation. Pre-generation methods rely on text prompts and struggle with the gap between prompt safety and image safety. Post-generation methods apply classifiers to final outputs, but they are poorly suited to intermediate noisy images. To address this, we introduce FlowGuard, a cross-model in-generation detection framework that inspects intermediate denoising steps. This is particularly challenging in latent diffusion, where early-stage noise obscures visual signals. FlowGuard employs a novel linear approximation for latent decoding and leverages a curriculum learning approach to stabilize training. By detecting unsafe content early, FlowGuard reduces…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.