Stochastic Self-Guidance for Training-Free Enhancement of Diffusion Models

Chubin Chen; Jiashu Zhu; Xiaokun Feng; Nisha Huang; Chen Zhu; Meiqi Wu; Fangyuan Mao; Jiahong Wu; Xiangxiang Chu; Xiu Li

arXiv:2508.12880·cs.CV·March 5, 2026

Stochastic Self-Guidance for Training-Free Enhancement of Diffusion Models

Chubin Chen, Jiashu Zhu, Xiaokun Feng, Nisha Huang, Chen Zhu, Meiqi Wu, Fangyuan Mao, Jiahong Wu, Xiangxiang Chu, Xiu Li

PDF

Open Access 3 Reviews

TL;DR

This paper introduces S$^2$-Guidance, a training-free stochastic guidance method that refines diffusion model outputs by leveraging stochastic sub-networks, improving quality over traditional classifier-free guidance in text-to-image and text-to-video tasks.

Contribution

It proposes a novel stochastic guidance technique that refines diffusion model predictions without additional training, addressing CFG's limitations and enhancing output quality.

Findings

01

S$^2$-Guidance outperforms CFG and other guidance methods in experiments.

02

The method improves semantic coherence and output quality.

03

Extensive experiments validate the effectiveness across multiple generation tasks.

Abstract

Classifier-free Guidance (CFG) is a widely used technique in modern diffusion models for enhancing sample quality and prompt adherence. However, through an empirical analysis on Gaussian mixture modeling with a closed-form solution, we observe a discrepancy between the suboptimal results produced by CFG and the ground truth. The model's excessive reliance on these suboptimal predictions often leads to semantic incoherence and low-quality outputs. To address this issue, we first empirically demonstrate that the model's suboptimal predictions can be effectively refined using sub-networks of the model itself. Building on this insight, we propose S $^{2}$ -Guidance, a novel method that leverages stochastic block-dropping during the forward process to construct stochastic sub-networks, effectively guiding the model away from potential low-quality predictions and toward high-quality outputs.…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 4

Strengths

- The paper is well-written and easy to understand. - Activating an internal “weak” predictor by randomly omitting blocks during inference is simple but effective. Unlike distillation-based methods and external weak models, this algorithm is training-free and plug-and-play. It can therefore be applied directly to the generation process, which is currently based on CFG. - An additional stochastic forward pass per step keeps complexity low compared to many other alternatives with guidance or self

Weaknesses

- In Appendix A, the authors provides an analysis of $S^2$-Guidance and Naive $S^2$-Guidance. The authros "posit" that $S^2$-Guidance is an approximately unbiased estimator of the "expected guidance" $G_{Naive}$. However, the "unbiasedness" of $G_{S^2-Guidance}$ is assumed, not shown. In addition, $\mu_{post}$ denotes a Bayesian posterior mean, but the algorithm actually only defines an empirical average of the predictions of the sub-network obtained by randomly dropping modules, which cannot f

Reviewer 02Rating 6Confidence 3

Strengths

- Training-free inference procedure that does not require auxiliary models. - Broad experimental coverage across multiple transformer-based diffusion backbones (images and video) with clear qualitative examples. - Ablation studies on several design choices (ω, drop ratio) and comparisons against a range of CFG variants. - The idea is easy to understand and implement for transformer architectures, and the empirical results are consistently positive.

Weaknesses

- Computational overhead: per denoising step the method requires an additional denoiser call (three evaluations vs. two for standard CFG). The efficiency claim is mainly relative to a heavier “naive” multi-sample variant rather than to CFG. The phrasing of the term "efficient" in the main text is therefore slightly misleading. - Architecture dependence: all demonstrations are on transformer-based diffusion models and the mechanism relies on dropping entire residual blocks with fixed shapes. It i

Reviewer 03Rating 6Confidence 3

Strengths

- The comparison and visualization of different guidance methods on GMM data are clear and well-presented. - Using block dropout instead of parameter dropout contributes to inference efficiency, akin to structured pruning, compared to conventional dropout. - The observation that the specific choice of dropout block has limited impact on performance is important. - Replacing manual block selection with stochastic dropout simplifies usage while maintaining or even improving general performance, ma

Weaknesses

There are several concerns regarding the use of block dropout. Dropping entire model blocks could severely degrade prediction quality if applied to critical blocks, potentially corrupting the model’s behavior. While AutoGuidance adopts parameter-level dropout, this paper employs the stronger perturbation of block-level dropout without providing sufficient motivation or references for this design choice. Indeed, as mentioned around L250-L254, applying naive S2-Guidance to key blocks drastically

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks