TL;DR
VAGS introduces a velocity-adaptive guidance scale for image editing and generation that dynamically adjusts guidance strength based on local semantic and structural alignment, improving fidelity without additional training.
Contribution
It proposes a training-free, adaptive guidance scale method that enhances image editing and generation quality by aligning guidance strength with model dynamics.
Findings
VAGS outperforms fixed CFG in structural fidelity and quality.
VAGS improves results across multiple datasets and tasks.
The method requires no fine-tuning or extra passes.
Abstract
Classifier-free guidance (CFG) is the primary control over how strongly text semantics move a flow-based sampler, yet standard practice holds its scale fixed across the entire ODE trajectory. This is a fundamental mismatch: early steps are noise-dominated and carry weak semantic signal, while late steps commit image structure and demand stronger directional commitment; more critically, the value of any guidance strength depends on whether the guided velocity is consistent with the model's current dynamics or working against them. We propose \textit{Velocity-Adaptive Guidance Scale} (VAGS), a training-free replacement that multiplies the nominal scale by a bounded factor combining a temporal signal-level term with the cosine similarity between task-relevant velocity fields. For inversion-free editing, VAGS measures the alignment between source- and target-guided velocities, so edit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
