ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization
Luca Eyring, Shyamgopal Karthik, Karsten Roth, Alexey Dosovitskiy,, Zeynep Akata

TL;DR
ReNO introduces a novel inference-time optimization method for text-to-image models that uses reward signals to improve image quality and detail, outperforming existing open-source models within seconds.
Contribution
The paper proposes Reward-based Noise Optimization (ReNO), a new inference-time technique that enhances T2I models using reward signals, addressing limitations of fine-tuning approaches.
Findings
ReNO improves performance of one-step T2I models on benchmarks.
ReNO-enhanced models outperform open-source models within 20-50 seconds.
User studies favor ReNO models nearly twice as often as SDXL.
Abstract
Text-to-Image (T2I) models have made significant advancements in recent years, but they still struggle to accurately capture intricate details specified in complex compositional prompts. While fine-tuning T2I models with reward objectives has shown promise, it suffers from "reward hacking" and may not generalize well to unseen prompt distributions. In this work, we propose Reward-based Noise Optimization (ReNO), a novel approach that enhances T2I models at inference by optimizing the initial noise based on the signal from one or multiple human preference reward models. Remarkably, solving this optimization problem with gradient ascent for 50 iterations yields impressive results on four different one-step models across two competitive benchmarks, T2I-CompBench and GenEval. Within a computational budget of 20-50 seconds, ReNO-enhanced one-step models consistently surpass the performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsHandwritten Text Recognition Techniques
MethodsDiffusion
