VSF: Simple, Efficient, and Effective Negative Guidance in Few-Step Image Generation Models By Value Sign Flip

Wenqi Guo; Shan Du

arXiv:2508.10931·cs.CV·February 20, 2026

VSF: Simple, Efficient, and Effective Negative Guidance in Few-Step Image Generation Models By Value Sign Flip

Wenqi Guo, Shan Du

PDF

3 Reviews

TL;DR

The paper presents VSF, a simple and efficient method that improves negative prompt guidance in few-step diffusion models by flipping attention signs, leading to better adherence and quality in image and video generation.

Contribution

Introducing Value Sign Flip (VSF), a novel technique that enhances negative guidance in diffusion models with minimal computational overhead and broad compatibility.

Findings

01

VSF outperforms prior negative guidance methods in few-step models.

02

VSF improves negative prompt adherence in static and video generation.

03

VSF maintains competitive image quality while enhancing guidance effectiveness.

Abstract

We introduce Value Sign Flip (VSF), a simple and efficient method for incorporating negative prompt guidance in few-step diffusion and flow-matching image generation models. Unlike existing approaches such as classifier-free guidance (CFG), NASA, and NAG, VSF dynamically suppresses undesired content by flipping the sign of attention values from negative prompts. Our method requires only small computational overhead and integrates effectively with MMDiT-style architectures such as Stable Diffusion 3.5 Turbo, as well as cross-attention-based models like Wan. We validate VSF on challenging datasets with complex prompt pairs and demonstrate superior performance in both static image and video generation tasks. Experimental results show that VSF significantly improves negative prompt adherence compared to prior methods in few-step models, and even CFG in non-few-step models, while maintaining…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 4

Strengths

The idea of this paper is simple, but I like how they have distilled knowledge from the literature, and based on that—as well as their solid understanding of the attention mechanism—they have proposed this simple idea.

Weaknesses

1) There are some grammatical errors and confusing parts in the paper that need to be addressed: - Should $x_{t-1}$ be $x_{t+1}$ in Eq. (1)? - line 166: *"The method NASA applies the guidance in intermediate states instead of the predicted noise or velocity."* — this statement is somewhat ambiguous. - line 188: "*However, it also limits the model’s ability to follow negative prompt guidance if the constraint is set to be too tight ...*" — this sentence could be improved for clarity and readabili

Reviewer 02Rating 8Confidence 4

Strengths

Originality: 1. The proposed method is new. Quality: 1. The related work section covers relevant literature: CFG, Negative Guidance, and Few-step generators 2. The competitors are aggressively chosen, even Nano Banana. 3. Discussion is thorough 1. trade-off between positive and negative prompts 2. trade-off between quality and negative prompts 3. attention maps 4. ablation study Clarity: 1. The explanations are kind to the readers, step-by-step from NASA to the proposed method.

Weaknesses

minor 1. Please properly use \citet and \citep 2. Fonts are too small in the figures. 3. Is “unbrulla” a typo? or is there a message?

Reviewer 03Rating 4Confidence 4

Strengths

- The proposed method is simple but effective - The proposed method can be applied to a few-step model.

Weaknesses

- The proposed method is not novel. Manipulating attention has been employed for image editing with diffusion models and the flow-matching model. (e.g., [Attend-and-exit], [self-guidance], [BoxDiff]) - In Figure 5, VFS not only eliminates the undesired contents but also changes the other components. Specifically, in the starry night examples, the city has gone. Also, in Figure 5(right), the car is still reflected in the image with the proposed method. - Figures should be well illustrated. ( font

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.