Multi-GRPO: Multi-Group Advantage Estimation for Text-to-Image Generation with Tree-Based Trajectories and Multiple Rewards

Qiang Lyu; Zicong Chen; Chongxiao Wang; Haolin Shi; Shibo Gao; Ran Piao; Youwei Zeng; Jianlou Si; Fei Ding; Jing Li; Chun Pong Lau; Weiqiang Wang

arXiv:2512.00743·cs.CV·December 2, 2025

Multi-GRPO: Multi-Group Advantage Estimation for Text-to-Image Generation with Tree-Based Trajectories and Multiple Rewards

Qiang Lyu, Zicong Chen, Chongxiao Wang, Haolin Shi, Shibo Gao, Ran Piao, Youwei Zeng, Jianlou Si, Fei Ding, Jing Li, Chun Pong Lau, Weiqiang Wang

PDF

Open Access

TL;DR

Multi-GRPO introduces a novel advantage estimation framework for text-to-image generation that uses tree-based trajectories and separate reward grouping to improve stability and multi-objective alignment.

Contribution

It proposes a multi-group advantage estimation method with tree-based trajectories and reward-based grouping to address limitations in existing GRPO methods for T2I models.

Findings

01

Achieves superior stability and alignment on benchmarks.

02

Effectively balances conflicting multi-objective rewards.

03

Demonstrates improved early denoising step estimation.

Abstract

Recently, Group Relative Policy Optimization (GRPO) has shown promising potential for aligning text-to-image (T2I) models, yet existing GRPO-based methods suffer from two critical limitations. (1) \textit{Shared credit assignment}: trajectory-level advantages derived from group-normalized sparse terminal rewards are uniformly applied across timesteps, failing to accurately estimate the potential of early denoising steps with vast exploration spaces. (2) \textit{Reward-mixing}: predefined weights for combining multi-objective rewards (e.g., text accuracy, visual quality, text color)--which have mismatched scales and variances--lead to unstable gradients and conflicting updates. To address these issues, we propose \textbf{Multi-GRPO}, a multi-group advantage estimation framework with two orthogonal grouping mechanisms. For better credit assignment, we introduce tree-based trajectories…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · Multimodal Machine Learning Applications