How Much To Guide: Revisiting Adaptive Guidance in Classifier-Free Guidance Text-to-Vision Diffusion Models

Huixuan Zhang; Junzhe Zhang; Xiaojun Wan

arXiv:2506.08351·cs.CV·June 11, 2025

How Much To Guide: Revisiting Adaptive Guidance in Classifier-Free Guidance Text-to-Vision Diffusion Models

Huixuan Zhang, Junzhe Zhang, Xiaojun Wan

PDF

Open Access

TL;DR

This paper introduces Step AG, a simple adaptive guidance strategy for classifier-free guidance in diffusion models, reducing computational costs while maintaining high image quality and alignment.

Contribution

It proposes a universal adaptive guidance method that restricts guidance to early denoising steps, improving efficiency across various models and settings.

Findings

01

Achieves 20-30% speedup in inference time.

02

Maintains high image quality and text-image alignment.

03

Effective across different models including video generation.

Abstract

With the rapid development of text-to-vision generation diffusion models, classifier-free guidance has emerged as the most prevalent method for conditioning. However, this approach inherently requires twice as many steps for model forwarding compared to unconditional generation, resulting in significantly higher costs. While previous study has introduced the concept of adaptive guidance, it lacks solid analysis and empirical results, making previous method unable to be applied to general diffusion models. In this work, we present another perspective of applying adaptive guidance and propose Step AG, which is a simple, universally applicable adaptive guidance strategy. Our evaluations focus on both image quality and image-text alignment. whose results indicate that restricting classifier-free guidance to the first several denoising steps is sufficient for generating high-quality,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning

MethodsFocus · Diffusion