Denoising, Fast and Slow: Difficulty-Aware Adaptive Sampling for Image Generation
Johannes Schusterbauer, Ming Gui, Yusong Li, Pingchuan Ma, Felix Krause, Bj\"orn Ommer

TL;DR
This paper introduces Patch Forcing, a framework for adaptive image generation that allocates compute dynamically across patches based on difficulty, improving quality in diffusion models.
Contribution
It proposes a patch-level noise scale approach with a difficulty-aware sampler and lightweight head, enabling more efficient and effective adaptive denoising in diffusion-based image synthesis.
Findings
Patch-level timesteps improve image quality over standard baselines.
The difficulty head enables dynamic compute allocation to harder regions.
Patch Forcing achieves superior results on ImageNet and scales to text-to-image synthesis.
Abstract
Diffusion- and flow-based models usually allocate compute uniformly across space, updating all patches with the same timestep and number of function evaluations. While convenient, this ignores the heterogeneity of natural images: some regions are easy to denoise, whereas others benefit from more refinement or additional context. Motivated by this, we explore patch-level noise scales for image synthesis. We find that naively varying timesteps across image tokens performs poorly, as it exposes the model to overly informative training states that do not occur at inference. We therefore introduce a timestep sampler that explicitly controls the maximum patch-level information available during training, and show that moving from global to patch-level timesteps already improves image generation over standard baselines. By further augmenting the model with a lightweight per-patch difficulty…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
