AsyncDiff: Asynchronous Timestep Conditioning for Enhanced Text-to-Image Diffusion Inference
Longhuan Xu, Feng Yin, Cunjian Chen

TL;DR
AsyncDiff introduces an asynchronous inference approach for text-to-image diffusion models, decoupling conditioning and update schedules with a learned timestep predictor, leading to improved image quality and control.
Contribution
It proposes a novel asynchronous inference mechanism with a learned timestep prediction module, enhancing control and quality in diffusion-based image synthesis.
Findings
Achieves consistent improvements on multiple datasets.
Effectively controls image detail and texture richness.
Operates efficiently with reduced steps.
Abstract
Text-to-image diffusion inference typically follows synchronized schedules, where the numerical integrator advances the latent state to the same timestep at which the denoiser is conditioned. We propose an asynchronous inference mechanism that decouples these two, allowing the denoiser to be conditioned at a different, learned timestep while keeping image update schedule unchanged. A lightweight timestep prediction module (TPM), trained with Group Relative Policy Optimization (GRPO), selects a more feasible conditioning timestep based on the current state, effectively choosing a desired noise level to control image detail and textural richness. At deployment, a scaling hyper-parameter can be used to interpolate between the original and de-synchronized timesteps, enabling conservative or aggressive adjustments. To keep the study computationally affordable, we cap the inference at 15…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Advanced Data Compression Techniques
