# rStar2-Agent: Agentic Reasoning Technical Report

**Authors:** Ning Shang, Yifei Liu, Yi Zhu, Li Lyna Zhang, Weijiang Xu, Xinyu Guan, Buze Zhang, Bingcheng Dong, Xudong Zhou, Bowen Zhang, Ying Xin, Ziming Miao, Scarlett Li, Fan Yang, Mao Yang

arXiv: 2508.20722 · 2025-08-29

## TL;DR

rStar2-Agent is a 14-billion parameter math reasoning model trained with innovative agentic reinforcement learning techniques, enabling advanced problem-solving, tool use, and reasoning capabilities with high efficiency and state-of-the-art performance.

## Contribution

The paper introduces a novel agentic RL framework and training recipe that significantly enhances cognitive abilities of a 14B model at reduced computational cost.

## Key findings

- Achieved 80.6% pass@1 on AIME24 with only 510 RL steps
- Demonstrated strong generalization to scientific reasoning and tool use
- Surpassed larger models like DeepSeek-R1 in performance

## Abstract

We introduce rStar2-Agent, a 14B math reasoning model trained with agentic reinforcement learning to achieve frontier-level performance. Beyond current long CoT, the model demonstrates advanced cognitive behaviors, such as thinking carefully before using Python coding tools and reflecting on code execution feedback to autonomously explore, verify, and refine intermediate steps in complex problem-solving. This capability is enabled through three key innovations that makes agentic RL effective at scale: (i) an efficient RL infrastructure with a reliable Python code environment that supports high-throughput execution and mitigates the high rollout costs, enabling training on limited GPU resources (64 MI300X GPUs); (ii) GRPO-RoC, an agentic RL algorithm with a Resample-on-Correct rollout strategy that addresses the inherent environment noises from coding tools, allowing the model to reason more effectively in a code environment; (iii) An efficient agent training recipe that starts with non-reasoning SFT and progresses through multi-RL stages, yielding advanced cognitive abilities with minimal compute cost. To this end, rStar2-Agent boosts a pre-trained 14B model to state of the art in only 510 RL steps within one week, achieving average pass@1 scores of 80.6% on AIME24 and 69.8% on AIME25, surpassing DeepSeek-R1 (671B) with significantly shorter responses. Beyond mathematics, rStar2-Agent-14B also demonstrates strong generalization to alignment, scientific reasoning, and agentic tool-use tasks. Code and training recipes are available at https://github.com/microsoft/rStar.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.20722/full.md

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/2508.20722/full.md

## References

47 references — full list in the complete paper: https://tomesphere.com/paper/2508.20722/full.md

---
Source: https://tomesphere.com/paper/2508.20722