E-GRPO: High Entropy Steps Drive Effective Reinforcement Learning for Flow Models
Shengjun Zhang, Zhang Zhang, Chensheng Dai, Yueqi Duan

TL;DR
This paper introduces E-GRPO, a reinforcement learning method that enhances flow models by increasing the entropy of sampling steps, leading to more efficient exploration and improved performance in reward alignment tasks.
Contribution
The paper proposes E-GRPO, a novel entropy-aware policy optimization technique that consolidates low entropy steps into high entropy steps for better exploration in flow models.
Findings
E-GRPO improves exploration efficiency in flow models.
Consolidating low entropy steps enhances reward signal clarity.
Experimental results show superior performance across reward settings.
Abstract
Recent reinforcement learning has enhanced the flow matching models on human preference alignment. While stochastic sampling enables the exploration of denoising directions, existing methods which optimize over multiple denoising steps suffer from sparse and ambiguous reward signals. We observe that the high entropy steps enable more efficient and effective exploration while the low entropy steps result in undistinguished roll-outs. To this end, we propose E-GRPO, an entropy aware Group Relative Policy Optimization to increase the entropy of SDE sampling steps. Since the integration of stochastic differential equations suffer from ambiguous reward signals due to stochasticity from multiple steps, we specifically merge consecutive low entropy steps to formulate one high entropy step for SDE sampling, while applying ODE sampling on other steps. Building upon this, we introduce multi-step…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Recommender Systems and Techniques · Artificial Intelligence in Games
