PromptRL: Prompt Matters in RL for Flow-Based Image Generation
Fu-Yun Wang, Han Zhang, Michael Gharbi, Hongsheng Li, Taesung Park

TL;DR
PromptRL introduces a novel framework integrating language models into reinforcement learning for flow-based image generation, significantly improving sample efficiency, prompt robustness, and overall performance on multiple benchmarks.
Contribution
It presents PromptRL, a method that uses trainable language model prompts within RL, enhancing diversity, reducing overfitting, and achieving state-of-the-art results with fewer samples.
Findings
Achieves state-of-the-art scores on multiple benchmarks.
Reduces rollouts by over 2× compared to naive RL.
Improves large-scale image editing performance with minimal additional data.
Abstract
Flow matching models (FMs) have revolutionized text-to-image (T2I) generation, with reinforcement learning (RL) serving as a critical post-training strategy for alignment with reward objectives. In this research, we show that current RL pipelines for FMs suffer from two underappreciated yet important limitations: sample inefficiency due to insufficient generation diversity, and pronounced prompt overfitting, where models memorize specific training formulations and exhibit dramatic performance collapse when evaluated on semantically equivalent but stylistically varied prompts. We present PromptRL (Prompt Matters in RL for Flow-Based Image Generation), a framework that incorporates language models (LMs) as trainable prompt refinement agents directly within the flow-based RL optimization loop. This design yields two complementary benefits: rapid development of sophisticated prompt…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
