E-GRPO: High Entropy Steps Drive Effective Reinforcement Learning for Flow Models

Shengjun Zhang; Zhang Zhang; Chensheng Dai; Yueqi Duan

arXiv:2601.00423·cs.LG·January 5, 2026

E-GRPO: High Entropy Steps Drive Effective Reinforcement Learning for Flow Models

Shengjun Zhang, Zhang Zhang, Chensheng Dai, Yueqi Duan

PDF

Open Access

TL;DR

This paper introduces E-GRPO, a reinforcement learning method that enhances flow models by increasing the entropy of sampling steps, leading to more efficient exploration and improved performance in reward alignment tasks.

Contribution

The paper proposes E-GRPO, a novel entropy-aware policy optimization technique that consolidates low entropy steps into high entropy steps for better exploration in flow models.

Findings

01

E-GRPO improves exploration efficiency in flow models.

02

Consolidating low entropy steps enhances reward signal clarity.

03

Experimental results show superior performance across reward settings.

Abstract

Recent reinforcement learning has enhanced the flow matching models on human preference alignment. While stochastic sampling enables the exploration of denoising directions, existing methods which optimize over multiple denoising steps suffer from sparse and ambiguous reward signals. We observe that the high entropy steps enable more efficient and effective exploration while the low entropy steps result in undistinguished roll-outs. To this end, we propose E-GRPO, an entropy aware Group Relative Policy Optimization to increase the entropy of SDE sampling steps. Since the integration of stochastic differential equations suffer from ambiguous reward signals due to stochasticity from multiple steps, we specifically merge consecutive low entropy steps to formulate one high entropy step for SDE sampling, while applying ODE sampling on other steps. Building upon this, we introduce multi-step…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Recommender Systems and Techniques · Artificial Intelligence in Games