Expand and Prune: Maximizing Trajectory Diversity for Effective GRPO in Generative Models
Shiran Ge, Chenyi Huang, Yuang Ai, Qihang Fan, Huaibo Huang, Ran He

TL;DR
This paper introduces Pro-GRPO, a dynamic framework that enhances trajectory diversity and reduces computational costs in generative model alignment by expanding and pruning trajectories during sampling.
Contribution
We propose Pro-GRPO, a novel dynamic method integrating latent feature-based pruning with an expand-and-prune strategy to improve efficiency and effectiveness in trajectory-based generative model optimization.
Findings
Pro-GRPO reduces computational overhead compared to static methods.
Expanding initial trajectory groups increases diversity and optimization potential.
Pro-GRPO outperforms existing methods on diffusion and flow-based models.
Abstract
Group Relative Policy Optimization (GRPO) is a powerful technique for aligning generative models, but its effectiveness is bottlenecked by the conflict between large group sizes and prohibitive computational costs. In this work, we investigate the trade-off through empirical studies, yielding two key observations. First, we discover the reward clustering phenomenon in which many trajectories collapse toward the group-mean reward, offering limited optimization value. Second, we design a heuristic strategy named Optimal Variance Filtering (OVF), and verify that a high-variance subset of trajectories, selected by OVF can outperform the larger, unfiltered group. However, this static, post-sampling OVF approach still necessitates critical computational overhead, as it performs unnecessary sampling for trajectories that are ultimately discarded. To resolve this, we propose Pro-GRPO (Proactive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Generative Adversarial Networks and Image Synthesis · Artificial Intelligence in Games
