Self-Evaluation Unlocks Any-Step Text-to-Image Generation
Xin Yu, Xiaojuan Qi, Zhengqi Li, Kai Zhang, Richard Zhang, Zhe Lin, Eli Shechtman, Tianyu Wang, Yotam Nitzan

TL;DR
Self-E introduces a novel self-evaluating training method for text-to-image generation that supports any-step inference, achieving high quality with fewer steps and outperforming traditional models in speed and scalability.
Contribution
It presents the first from-scratch, any-step text-to-image model using self-evaluation, bridging local supervision and global matching for efficient high-quality generation.
Findings
Excels in few-step generation with high quality.
Competitive with state-of-the-art models at 50 steps.
Performance improves monotonically with more inference steps.
Abstract
We introduce the Self-Evaluating Model (Self-E), a novel, from-scratch training approach for text-to-image generation that supports any-step inference. Self-E learns from data similarly to a Flow Matching model, while simultaneously employing a novel self-evaluation mechanism: it evaluates its own generated samples using its current score estimates, effectively serving as a dynamic self-teacher. Unlike traditional diffusion or flow models, it does not rely solely on local supervision, which typically necessitates many inference steps. Unlike distillation-based approaches, it does not require a pretrained teacher. This combination of instantaneous local learning and self-driven global matching bridges the gap between the two paradigms, enabling the training of a high-quality text-to-image model from scratch that excels even at very low step counts. Extensive experiments on large-scale…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
