TL;DR
The paper presents VSF, a simple and efficient method that improves negative prompt guidance in few-step diffusion models by flipping attention signs, leading to better adherence and quality in image and video generation.
Contribution
Introducing Value Sign Flip (VSF), a novel technique that enhances negative guidance in diffusion models with minimal computational overhead and broad compatibility.
Findings
VSF outperforms prior negative guidance methods in few-step models.
VSF improves negative prompt adherence in static and video generation.
VSF maintains competitive image quality while enhancing guidance effectiveness.
Abstract
We introduce Value Sign Flip (VSF), a simple and efficient method for incorporating negative prompt guidance in few-step diffusion and flow-matching image generation models. Unlike existing approaches such as classifier-free guidance (CFG), NASA, and NAG, VSF dynamically suppresses undesired content by flipping the sign of attention values from negative prompts. Our method requires only small computational overhead and integrates effectively with MMDiT-style architectures such as Stable Diffusion 3.5 Turbo, as well as cross-attention-based models like Wan. We validate VSF on challenging datasets with complex prompt pairs and demonstrate superior performance in both static image and video generation tasks. Experimental results show that VSF significantly improves negative prompt adherence compared to prior methods in few-step models, and even CFG in non-few-step models, while maintaining…
Peer Reviews
Decision·ICLR 2026 Poster
The idea of this paper is simple, but I like how they have distilled knowledge from the literature, and based on that—as well as their solid understanding of the attention mechanism—they have proposed this simple idea.
1) There are some grammatical errors and confusing parts in the paper that need to be addressed: - Should $x_{t-1}$ be $x_{t+1}$ in Eq. (1)? - line 166: *"The method NASA applies the guidance in intermediate states instead of the predicted noise or velocity."* — this statement is somewhat ambiguous. - line 188: "*However, it also limits the model’s ability to follow negative prompt guidance if the constraint is set to be too tight ...*" — this sentence could be improved for clarity and readabili
Originality: 1. The proposed method is new. Quality: 1. The related work section covers relevant literature: CFG, Negative Guidance, and Few-step generators 2. The competitors are aggressively chosen, even Nano Banana. 3. Discussion is thorough 1. trade-off between positive and negative prompts 2. trade-off between quality and negative prompts 3. attention maps 4. ablation study Clarity: 1. The explanations are kind to the readers, step-by-step from NASA to the proposed method.
minor 1. Please properly use \citet and \citep 2. Fonts are too small in the figures. 3. Is “unbrulla” a typo? or is there a message?
- The proposed method is simple but effective - The proposed method can be applied to a few-step model.
- The proposed method is not novel. Manipulating attention has been employed for image editing with diffusion models and the flow-matching model. (e.g., [Attend-and-exit], [self-guidance], [BoxDiff]) - In Figure 5, VFS not only eliminates the undesired contents but also changes the other components. Specifically, in the starry night examples, the city has gone. Also, in Figure 5(right), the car is still reflected in the image with the proposed method. - Figures should be well illustrated. ( font
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
