Expand and Prune: Maximizing Trajectory Diversity for Effective GRPO in Generative Models

Shiran Ge; Chenyi Huang; Yuang Ai; Qihang Fan; Huaibo Huang; Ran He

arXiv:2512.15347·cs.CV·December 18, 2025

Expand and Prune: Maximizing Trajectory Diversity for Effective GRPO in Generative Models

Shiran Ge, Chenyi Huang, Yuang Ai, Qihang Fan, Huaibo Huang, Ran He

PDF

Open Access

TL;DR

This paper introduces Pro-GRPO, a dynamic framework that enhances trajectory diversity and reduces computational costs in generative model alignment by expanding and pruning trajectories during sampling.

Contribution

We propose Pro-GRPO, a novel dynamic method integrating latent feature-based pruning with an expand-and-prune strategy to improve efficiency and effectiveness in trajectory-based generative model optimization.

Findings

01

Pro-GRPO reduces computational overhead compared to static methods.

02

Expanding initial trajectory groups increases diversity and optimization potential.

03

Pro-GRPO outperforms existing methods on diffusion and flow-based models.

Abstract

Group Relative Policy Optimization (GRPO) is a powerful technique for aligning generative models, but its effectiveness is bottlenecked by the conflict between large group sizes and prohibitive computational costs. In this work, we investigate the trade-off through empirical studies, yielding two key observations. First, we discover the reward clustering phenomenon in which many trajectories collapse toward the group-mean reward, offering limited optimization value. Second, we design a heuristic strategy named Optimal Variance Filtering (OVF), and verify that a high-variance subset of trajectories, selected by OVF can outperform the larger, unfiltered group. However, this static, post-sampling OVF approach still necessitates critical computational overhead, as it performs unnecessary sampling for trajectories that are ultimately discarded. To resolve this, we propose Pro-GRPO (Proactive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Generative Adversarial Networks and Image Synthesis · Artificial Intelligence in Games