UniGRPO: Unified Policy Optimization for Reasoning-Driven Visual Generation

Jie Liu; Zilyu Ye; Linxiao Yuan; Shenhan Zhu; Yu Gao; Jie Wu; Kunchang Li; Xionghui Wang; Xiaonan Nie; Weilin Huang; Wanli Ouyang

arXiv:2603.23500·cs.CV·March 25, 2026

UniGRPO: Unified Policy Optimization for Reasoning-Driven Visual Generation

Jie Liu, Zilyu Ye, Linxiao Yuan, Shenhan Zhu, Yu Gao, Jie Wu, Kunchang Li, Xionghui Wang, Xiaonan Nie, Weilin Huang, Wanli Ouyang

PDF

Open Access

TL;DR

This paper introduces UniGRPO, a unified reinforcement learning framework that enhances reasoning-driven visual generation by jointly optimizing text and image policies, enabling scalable, multi-round interleaved generation.

Contribution

It proposes a novel unified RL approach for interleaved text and image generation, with specific modifications to improve scalability and robustness in multi-turn scenarios.

Findings

01

Improved image quality through reasoning-based generation.

02

Scalable approach for multi-round interleaved generation.

03

Effective mitigation of reward hacking with new regularization.

Abstract

Unified models capable of interleaved generation have emerged as a promising paradigm, with the community increasingly converging on autoregressive modeling for text and flow matching for image generation. To advance this direction, we propose a unified reinforcement learning framework tailored for interleaved generation. We validate our approach on its fundamental unit: a single round of reasoning-driven image generation, where the model first expands the user prompt through reasoning, followed by image synthesis. Formulating this multimodal generation process as a Markov Decision Process with sparse terminal rewards, we introduce UniGRPO to jointly optimize text and image generation policies using GRPO. Adopting a minimalist methodology to avoid over-design, we leverage established training recipes for both modalities by seamlessly integrating standard GRPO for reasoning and FlowGRPO…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Computer Graphics and Visualization Techniques