Safety-Guided Flow (SGF): A Unified Framework for Negative Guidance in Safe Generation

Mingyu Kim; Young-Heon Kim; Mijung Park

arXiv:2603.13300·cs.CV·March 17, 2026

Safety-Guided Flow (SGF): A Unified Framework for Negative Guidance in Safe Generation

Mingyu Kim, Young-Heon Kim, Mijung Park

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a unified probabilistic framework for safe image generation using negative guidance, combining control barrier functions and data-driven approaches to improve safety and quality.

Contribution

It presents a novel energy-based negative guidance method that unifies existing safety techniques and identifies a critical time window for guidance application.

Findings

01

Negative guidance is most effective in early denoising stages.

02

The framework unifies control barrier functions with data-driven safety methods.

03

Applying guidance outside the critical window can compromise safety and quality.

Abstract

Safety mechanisms for diffusion and flow models have recently been developed along two distinct paths. In robot planning, control barrier functions are employed to guide generative trajectories away from obstacles at every denoising step by explicitly imposing geometric constraints. In parallel, recent data-driven, negative guidance approaches have been shown to suppress harmful content and promote diversity in generated samples. However, they rely on heuristics without clearly stating when safety guidance is actually necessary. In this paper, we first introduce a unified probabilistic framework using a Maximum Mean Discrepancy (MMD) potential for image generation tasks that recasts both Shielded Diffusion and Safe Denoiser as instances of our energy-based negative guidance against unsafe data samples. Furthermore, we leverage control-barrier functions analysis to justify the existence…

Peer Reviews

Decision·ICLR 2026 Oral

Reviewer 01Rating 6Confidence 3

Strengths

1. Safe generation is critical for real-world applications, making this an important research direction. 2. The proposed unified framework based on MMD guidance effectively covers and connects recent works in the field. 3. The early-stage guidance analysis is insightful and validated through empirical studies.

Weaknesses

1. Assumption 1(b) needs more justification. While the authors provide an intuitive understanding for the assumption at the final time step, it is unclear how this assumption holds at other time steps and how it should be interpreted more generally. If this is a standard choice in the control barrier function literature, please provide a detailed discussion and relevant citations. 2. Theorem 2 and the ablation study need better alignment and justification. Why does the ASR decrease and then incr

Reviewer 02Rating 6Confidence 3

Strengths

The theoretical insights are novel. The authors provide a more principled objective for safety-aware negative guidance. Unlike previous methods' formulation based on binary/proximity classification, the proposed SGF views the problem as maximizing a proper divergence metric (MMD) between the undesirable distribution and the generated distribution. The critical time window theory also explains why early stopping is effective.

Weaknesses

While the theoretical insights are novel, the pragmatic novelty is limited. The paper is mainly focused on 'why it works.' For instance, the MMD-based formulation is sound and novel, but the resulting parametric form of the guidance model itself is effectively identical to SafeDenoiser. The critical time window theory provides why certain stopping parameter is better, but this can be empirically chosen without theory.

Reviewer 03Rating 4Confidence 4

Strengths

* Tackles an important problem * Section 4.2 and 4.3 shows that the proposed method subsumes prior work. * Section 4.4 presents an interesting analysis. * Empirical studies seem appropriate in illustrating the effectiveness of the proposed approach.

Weaknesses

## Primary concerns 1. How do you pick $s_c$? One of the criticisms of Safe Denoiser is that they pick the interval to apply guidance on heuristically. Is this not the same? 2. What is the compute cost of estimating the MMD and likewise autograd cost for calculating the gradient wrt $\boldsymbol x$? This seems like it could become very expensive once as the size of the unsafe reference dataset grows. How large does $\mathcal D^-$ need to be for the distance to work well, clearly a degenerate sin

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Generative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning