Rethinking Reward Signals in Video GRPO: When Scores Become Targets

Rui Li; Yuanzhi Liang; Ziqi Ni; Haibing Huang; Chi Zhang; Xuelong Li

arXiv:2511.19356·cs.CV·March 18, 2026

Rethinking Reward Signals in Video GRPO: When Scores Become Targets

Rui Li, Yuanzhi Liang, Ziqi Ni, Haibing Huang, Chi Zhang, Xuelong Li

PDF

Open Access

TL;DR

This paper introduces TaRoS, a novel reward signaling framework for Video GRPO that addresses reward fidelity issues, leading to more reliable and effective video generation.

Contribution

It proposes TaRoS, which uses component-level assessment and adaptive downweighting to improve reward robustness and prevent reward hacking in Video GRPO.

Findings

01

Improved visual fidelity in generated videos

02

Enhanced motion coherence and text-video alignment

03

Reduced reward saturation and shortcut optimization

Abstract

Group Relative Policy Optimization (GRPO) enables stable and preference-oriented updates via group-wise comparisons for post-training video generation. However, GRPO directly optimizes reward-induced advantages. Under sustained optimization, the reward score can lose fidelity as a proxy for true video quality, consistent with the phenomenon described by Goodhart's Law. This leads to two recurring issues: (i) shortcut-driven optimization under composite objectives and (ii) reward saturation within prompt groups. To address these issues, we introduce TaRoS, a Target-Robust Reward Signaling framework for Video generation GRPO. TaRoS leverages component level performance assessment together with intra-group sparsity to organize multi-aspect rewards towards optimization objectives. In addition, it adaptively downweights components that exhibit saturation, thereby preserving effective…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning