Reward and Guidance through Rubrics: Promoting Exploration to Improve Multi-Domain Reasoning

Baolong Bi; Shenghua Liu; Yiwei Wang; Siqian Tong; Lingrui Mei; Yuyao Ge; Yilong Xu; Jiafeng Guo; Xueqi Cheng

arXiv:2511.12344·cs.AI·November 20, 2025

Reward and Guidance through Rubrics: Promoting Exploration to Improve Multi-Domain Reasoning

Baolong Bi, Shenghua Liu, Yiwei Wang, Siqian Tong, Lingrui Mei, Yuyao Ge, Yilong Xu, Jiafeng Guo, Xueqi Cheng

PDF

Open Access

TL;DR

This paper introduces RGR-GRPO, a rubric-driven reinforcement learning framework that enhances multi-domain reasoning in large language models by providing dense rewards and offline guidance, leading to significant performance improvements.

Contribution

The paper presents RGR-GRPO, a novel RL framework leveraging rubrics for multi-domain reasoning, enabling better exploration and outperforming existing methods.

Findings

01

RGR-GRPO outperforms baseline RL methods across 14 benchmarks.

02

Achieves +7.0% to +8.4% improvements in key reasoning tasks.

03

Maintains stable entropy and enhances pass@k performance.

Abstract

Recent advances in reinforcement learning (RL) have significantly improved the complex reasoning capabilities of large language models (LLMs). Despite these successes, existing methods mainly focus on single-domain RL (e.g., mathematics) with verifiable rewards (RLVR), and their reliance on purely online RL frameworks restricts the exploration space, thereby limiting reasoning performance. In this paper, we address these limitations by leveraging rubrics to provide both fine-grained reward signals and offline guidance. We propose $RGR-GRPO$ (Reward and Guidance through Rubrics), a rubric-driven RL framework for multi-domain reasoning. RGR-GRPO enables LLMs to receive dense and informative rewards while exploring a larger solution space during GRPO training. Extensive experiments across 14 benchmarks spanning multiple domains demonstrate that RGR-GRPO consistently outperforms RL…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Topic Modeling