R^3: Replay, Reflection, and Ranking Rewards for LLM Reinforcement Learning

Zhizheng Jiang; Kang Zhao; Weikai Xu; Xinkui Lin; Wei Liu; Jian Luan; Shuo Shang; Peng Han

arXiv:2601.19620·cs.LG·January 29, 2026

R^3: Replay, Reflection, and Ranking Rewards for LLM Reinforcement Learning

Zhizheng Jiang, Kang Zhao, Weikai Xu, Xinkui Lin, Wei Liu, Jian Luan, Shuo Shang, Peng Han

PDF

Open Access

TL;DR

The paper introduces R^3, a reinforcement learning framework for large reasoning models that enhances training stability and performance through replay, reflection, and ranking rewards, leading to state-of-the-art results in math benchmarks.

Contribution

It proposes a novel RL mechanism with cross-context replay, self-reflection, and entropy ranking rewards to improve large reasoning models' training and reasoning capabilities.

Findings

01

Achieves state-of-the-art performance on math benchmarks.

02

Demonstrates significant improvements over base models.

03

Requires fewer reasoning tokens for accurate solutions.

Abstract

Large reasoning models (LRMs) aim to solve diverse and complex problems through structured reasoning. Recent advances in group-based policy optimization methods have shown promise in enabling stable advantage estimation without reliance on process-level annotations. However, these methods rely on advantage gaps induced by high-quality samples within the same batch, which makes the training process fragile and inefficient when intra-group advantages collapse under challenging tasks. To address these problems, we propose a reinforcement learning mechanism named \emph{\textbf{R^3}} that along three directions: (1) a \emph{cross-context \underline{\textbf{R}}eplay} strategy that maintains the intra-group advantage by recalling valuable examples from historical trajectories of the same query, (2) an \emph{in-context self-\underline{\textbf{R}}eflection} mechanism enabling models to refine…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Graph Neural Networks · Explainable Artificial Intelligence (XAI)