Reward-Agnostic Prompt Optimization for Text-to-Image Diffusion Models
Semin Kim, Yeonwoo Cha, Jaehoon Yoo, Seunghoon Hong

TL;DR
RATTPO is a flexible, reward-agnostic prompt optimization method for text-to-image diffusion models that improves prompts across various reward scenarios without needing reward-specific adjustments, enhancing efficiency and performance.
Contribution
Introduces RATTPO, a novel test-time prompt optimization approach that is adaptable to multiple reward models without modification, outperforming existing methods in efficiency and effectiveness.
Findings
RATTPO improves prompt quality across diverse reward models.
It runs 4.8 times faster than naive search baselines.
Achieves comparable performance to reward-specific fine-tuning with sufficient inference budget.
Abstract
We investigate a general approach for improving user prompts in text-to-image (T2I) diffusion models by finding prompts that maximize a reward function specified at test-time. Although diverse reward models are used for evaluating image generation, existing automated prompt engineering methods typically target specific reward configurations. Consequently, these specialized designs exhibit suboptimal performance when applied to new prompt engineering scenarios involving different reward models. To address this limitation, we introduce RATTPO (Reward-Agnostic Test-Time Prompt Optimization), a flexible test-time optimization method applicable across various reward scenarios without modification. RATTPO iteratively searches for optimized prompts by querying large language models (LLMs) \textit{without} requiring reward-specific task descriptions. Instead, it uses the optimization trajectory…
Peer Reviews
Decision·Submitted to ICLR 2026
Extensive experiments across 8 reward setups, showing versatility and efficiency.
- Computational cost: Despite efficiency gains, RATTPO requires multiple image generations per iteration (line 7, Algorithm 1). Potential optimizations (e.g., caching) are unexplored. - Prompt length constraints: The impact of initial prompt length on optimization is not analyzed. - Novelty limited, because iteratively prompt optimization is trivial.
1. The experimental results look very promising, especially in Figure 1 where they show great test time scaling. 2. The algorithm is fairly simple and easy to implement. 3. The paper is well written and easy to understand.
1. My main concern about the paper is regarding its novelty. The idea of both LLM as automated prompt generator and as a judge/hint giver has been thoroughly explored both in the context of LLM self-improvement/RLAIF [2,3,4] [(Madaan et al., 2023; Wang et al., 2023a; Shinn et al., 2023) from the paper] and text-to-image generation [1] [(Yang et al., 2024; Fernando et al., 2023; Du et al., 2024; He et al., 2024; Mañas et al., 2024) from the paper]. In fact, the algorithm proposed in this paper is
- The motivation and significance of the proposed scenario are clearly articulated and highly relevant. - The experimental results convincingly demonstrate the superiority of the proposed method over existing approaches.
- The paper is poorly written, with an overly brief description of the methodology. It lacks essential details about the input prompts used for the first LLM to generate candidate prompts for image generation, the input prompts for the second LLM, and the specific format of the "hint" texts, all of which are critical to understanding the core approach. - The paper lacks a clear diagram illustrating the overall workflow of the proposed method; Algorithm 1 alone is insufficient for conveying the p
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Topic Modeling · Lung Cancer Treatments and Mutations
