UniGRPO: Unified Policy Optimization for Reasoning-Driven Visual Generation
Jie Liu, Zilyu Ye, Linxiao Yuan, Shenhan Zhu, Yu Gao, Jie Wu, Kunchang Li, Xionghui Wang, Xiaonan Nie, Weilin Huang, Wanli Ouyang

TL;DR
This paper introduces UniGRPO, a unified reinforcement learning framework that enhances reasoning-driven visual generation by jointly optimizing text and image policies, enabling scalable, multi-round interleaved generation.
Contribution
It proposes a novel unified RL approach for interleaved text and image generation, with specific modifications to improve scalability and robustness in multi-turn scenarios.
Findings
Improved image quality through reasoning-based generation.
Scalable approach for multi-round interleaved generation.
Effective mitigation of reward hacking with new regularization.
Abstract
Unified models capable of interleaved generation have emerged as a promising paradigm, with the community increasingly converging on autoregressive modeling for text and flow matching for image generation. To advance this direction, we propose a unified reinforcement learning framework tailored for interleaved generation. We validate our approach on its fundamental unit: a single round of reasoning-driven image generation, where the model first expands the user prompt through reasoning, followed by image synthesis. Formulating this multimodal generation process as a Markov Decision Process with sparse terminal rewards, we introduce UniGRPO to jointly optimize text and image generation policies using GRPO. Adopting a minimalist methodology to avoid over-design, we leverage established training recipes for both modalities by seamlessly integrating standard GRPO for reasoning and FlowGRPO…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Computer Graphics and Visualization Techniques
