Not Every Rubric Teaches Equally: Policy-Aware Rubric Rewards for RLVR

Utkarsh Tyagi; Xingang Guo; MohammadHossein Rezaei; Daniel George; Anas Mahmoud; Jackson Lee; Bing Liu; Yunzhong He

arXiv:2605.20164·cs.AI·May 20, 2026

Not Every Rubric Teaches Equally: Policy-Aware Rubric Rewards for RLVR

Utkarsh Tyagi, Xingang Guo, MohammadHossein Rezaei, Daniel George, Anas Mahmoud, Jackson Lee, Bing Liu, Yunzhong He

PDF

TL;DR

This paper introduces POW3R, a policy-aware rubric reward framework that dynamically adjusts criterion weights during reinforcement learning to improve policy training efficiency and effectiveness.

Contribution

POW3R preserves human rubric importance while adaptively emphasizing criteria that distinguish current policy outputs, enhancing training signals in RL with rubric rewards.

Findings

01

POW3R outperforms vanilla GRPO in 24 of 30 policy/metric comparisons.

02

POW3R improves mean rubric reward and strict completion rates.

03

Training with POW3R reaches performance plateaus 2.5 to 4 times faster.

Abstract

Reinforcement learning with verifiable rewards has made post-training highly effective when correctness can be checked automatically. However, many important model behaviors require satisfying several qualitative criteria at once. Rubric-based rewards address this setting by grading prompt-specific criteria and aggregating them into a scalar reward. Yet standard static aggregations conflate a criterion's human-assigned importance with its current usefulness as an optimization signal. We show that this assumption breaks down in rubric RL: many important criteria are already saturated or currently unreachable, while criteria that distinguish rollouts are not necessarily those with the largest human weights. We introduce POW3R, a policy-aware rubric reward framework that preserves human weights and category balance as the rubric objective while adapting criterion-level reward weights…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.