Toward Early Quality Assessment of Text-to-Image Diffusion Models
Huanlei Guo, Hongxin Wei, Bingyi Jing

TL;DR
This paper introduces Probe-Select, a method that predicts final image quality from early diffusion model activations, enabling early seed termination, reducing costs, and improving image quality in text-to-image generation.
Contribution
It presents a novel plug-in module that leverages early denoiser activations to predict image quality, significantly reducing resource usage and enhancing selection accuracy.
Findings
Early activations encode stable structural information.
Early evaluation predicts final quality with 20% of the diffusion steps.
Sampling cost is reduced by over 60% with improved image quality.
Abstract
Recent text-to-image (T2I) diffusion and flow-matching models can produce highly realistic images from natural language prompts. In practical scenarios, T2I systems are often run in a ``generate--then--select'' mode: many seeds are sampled and only a few images are kept for use. However, this pipeline is highly resource-intensive since each candidate requires tens to hundreds of denoising steps, and evaluation metrics such as CLIPScore and ImageReward are post-hoc. In this work, we address this inefficiency by introducing Probe-Select, a plug-in module that enables efficient evaluation of image quality within the generation process. We observe that certain intermediate denoiser activations, even at early timesteps, encode a stable coarse structure, object layout and spatial arrangement--that strongly correlates with final image fidelity. Probe-Select exploits this property by predicting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Face recognition and analysis
