Delta Score Matters! Spatial Adaptive Multi Guidance in Diffusion Models
Haosen Li, Wenshuo Chen, Lei Wang, Shaofeng Liang, Bowen Tian, Soning Lai, Yutao Yue

TL;DR
This paper introduces Spatial Adaptive Multi Guidance (SAMG), a novel method for diffusion models that dynamically adjusts guidance strength across spatial regions, improving semantic fidelity and structural consistency in generated images and videos.
Contribution
The paper presents a geometry-inspired, training-free guidance technique that adaptively modulates guidance scales, addressing the limitations of uniform classifier-free guidance in diffusion models.
Findings
SAMG improves semantic alignment and structural integrity in generated visuals.
SAMG reduces artifacts and enhances temporal smoothness in videos.
Experiments show SAMG outperforms standard guidance methods across multiple architectures.
Abstract
Diffusion models have achieved remarkable success in synthesizing complex static and temporal visuals, a breakthrough largely driven by Classifier-Free Guidance (CFG). However, despite its pivotal role in aligning generated content with textual prompts, standard CFG relies on a globally uniform scalar. This homogeneous amplification traps models in a well-documented "detail-artifact dilemma": low guidance scales fail to inject intricate semantics, while high scales inevitably cause structural degradation, color over-saturation, and temporal inconsistencies in videos. In this paper, we expose the physical root of this flaw through the lens of differential geometry. By analyzing Tweedie's Formula, we reveal that CFG intrinsically performs a tangential linear extrapolation. Because the natural data manifold is highly curved, this uniform linear step introduces a severe orthogonal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
