Alleviating Sparse Rewards by Modeling Step-Wise and Long-Term Sampling Effects in Flow-Based GRPO

Yunze Tong; Mushui Liu; Canyu Zhao; Wanggui He; Shiyi Zhang; Hongwei Zhang; Peng Zhang; Jinlong Liu; Ju Huang; Jiamang Wang; Hao Jiang; Pipei Huang

arXiv:2602.06422·cs.CV·February 9, 2026

Alleviating Sparse Rewards by Modeling Step-Wise and Long-Term Sampling Effects in Flow-Based GRPO

Yunze Tong, Mushui Liu, Canyu Zhao, Wanggui He, Shiyi Zhang, Hongwei Zhang, Peng Zhang, Jinlong Liu, Ju Huang, Jiamang Wang, Hao Jiang, Pipei Huang

PDF

Open Access

TL;DR

This paper introduces TurningPoint-GRPO, a novel reinforcement learning framework for flow-based models that enhances reward signals by modeling step-wise and long-term effects, leading to improved text-to-image generation.

Contribution

It proposes a step-level incremental reward mechanism and a turning point detection method to better capture long-term effects in flow-based generative models.

Findings

01

Improves reward signal density and effectiveness.

02

Enhances generation quality and consistency.

03

Efficient, hyperparameter-free turning point detection.

Abstract

Deploying GRPO on Flow Matching models has proven effective for text-to-image generation. However, existing paradigms typically propagate an outcome-based reward to all preceding denoising steps without distinguishing the local effect of each step. Moreover, current group-wise ranking mainly compares trajectories at matched timesteps and ignores within-trajectory dependencies, where certain early denoising actions can affect later states via delayed, implicit interactions. We propose TurningPoint-GRPO (TP-GRPO), a GRPO framework that alleviates step-wise reward sparsity and explicitly models long-term effects within the denoising trajectory. TP-GRPO makes two key innovations: (i) it replaces outcome-based rewards with step-level incremental rewards, providing a dense, step-aware learning signal that better isolates each denoising action's "pure" effect, and (ii) it identifies turning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Artificial Intelligence in Games