A Survey of Reinforcement Learning for Large Reasoning Models

Kaiyan Zhang; Yuxin Zuo; Bingxiang He; Youbang Sun; Runze Liu; Che Jiang; Yuchen Fan; Kai Tian; Guoli Jia; Pengfei Li; Yu Fu; Xingtai Lv; Yuchen Zhang; Sihang Zeng; Shang Qu; Haozhan Li; Shijie Wang; Yuru Wang; Xinwei Long; Fangfu Liu; Xiang Xu; Jiaze Ma; Xuekai Zhu; Ermo Hua; Yihao Liu; Zonglin Li; Huayu Chen; Xiaoye Qu; Yafu Li; Weize Chen; Zhenzhao Yuan; Junqi Gao; Dong Li; Zhiyuan Ma; Ganqu Cui; Zhiyuan Liu; Biqing Qi; Ning Ding; Bowen Zhou

arXiv:2509.08827·cs.CL·October 10, 2025·2 cites

A Survey of Reinforcement Learning for Large Reasoning Models

Kaiyan Zhang, Yuxin Zuo, Bingxiang He, Youbang Sun, Runze Liu, Che Jiang, Yuchen Fan, Kai Tian, Guoli Jia, Pengfei Li, Yu Fu, Xingtai Lv, Yuchen Zhang, Sihang Zeng, Shang Qu, Haozhan Li, Shijie Wang, Yuru Wang, Xinwei Long, Fangfu Liu, Xiang Xu, Jiaze Ma, Xuekai Zhu, Ermo Hua

PDF

Open Access

TL;DR

This survey reviews recent progress in applying Reinforcement Learning to enhance reasoning capabilities of Large Language Models, highlighting challenges and future directions for scaling towards Artificial SuperIntelligence.

Contribution

It provides a comprehensive overview of RL techniques for reasoning with LLMs and LRMs, analyzing recent developments, challenges, and future research opportunities.

Findings

01

RL has significantly advanced reasoning in LLMs.

02

Scaling RL for LRMs faces computational and algorithmic challenges.

03

The survey identifies key research directions for future development.

Abstract

In this paper, we survey recent advances in Reinforcement Learning (RL) for reasoning with Large Language Models (LLMs). RL has achieved remarkable success in advancing the frontier of LLM capabilities, particularly in addressing complex logical tasks such as mathematics and coding. As a result, RL has emerged as a foundational methodology for transforming LLMs into LRMs. With the rapid progress of the field, further scaling of RL for LRMs now faces foundational challenges not only in computational resources but also in algorithm design, training data, and infrastructure. To this end, it is timely to revisit the development of this domain, reassess its trajectory, and explore strategies to enhance the scalability of RL toward Artificial SuperIntelligence (ASI). In particular, we examine research applying RL to LLMs and LRMs for reasoning abilities, especially since the release of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)