DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z.F. Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao, Aixin Liu, Bing Xue, Bingxuan Wang, Bochao Wu, Bei Feng

TL;DR
This paper demonstrates that reinforcement learning can incentivize reasoning in large language models, enabling them to develop advanced reasoning skills without human-labeled data, and improving performance on complex tasks.
Contribution
It introduces a reinforcement learning framework that promotes emergent reasoning abilities in LLMs without relying on human-annotated reasoning trajectories.
Findings
Models trained with RL outperform supervised counterparts on reasoning tasks.
Emergent reasoning patterns include self-reflection and verification.
Enhanced reasoning skills can be transferred to smaller models.
Abstract
General reasoning represents a long-standing and formidable challenge in artificial intelligence. Recent breakthroughs, exemplified by large language models (LLMs) and chain-of-thought prompting, have achieved considerable success on foundational reasoning tasks. However, this success is heavily contingent upon extensive human-annotated demonstrations, and models' capabilities are still insufficient for more complex problems. Here we show that the reasoning abilities of LLMs can be incentivized through pure reinforcement learning (RL), obviating the need for human-labeled reasoning trajectories. The proposed RL framework facilitates the emergent development of advanced reasoning patterns, such as self-reflection, verification, and dynamic strategy adaptation. Consequently, the trained model achieves superior performance on verifiable tasks such as mathematics, coding competitions, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗deepseek-ai/DeepSeek-R1model· 2.3M dl· ♡ 131202.3M dl♡ 13120
- 🤗deepseek-ai/DeepSeek-R1-Distill-Qwen-14Bmodel· 464k dl· ♡ 618464k dl♡ 618
- 🤗deepseek-ai/DeepSeek-R1-Distill-Qwen-7Bmodel· 610k dl· ♡ 799610k dl♡ 799
- 🤗deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5Bmodel· 807k dl· ♡ 1466807k dl♡ 1466
- 🤗deepseek-ai/DeepSeek-R1-Distill-Llama-70Bmodel· 138k dl· ♡ 760138k dl♡ 760
- 🤗deepseek-ai/DeepSeek-R1-Distill-Qwen-32Bmodel· 914k dl· ♡ 1528914k dl♡ 1528
- 🤗deepseek-ai/DeepSeek-R1-Zeromodel· 5.9k dl· ♡ 9475.9k dl♡ 947
- 🤗unsloth/DeepSeek-R1-Distill-Qwen-14B-GGUFmodel· 65k dl· ♡ 10165k dl♡ 101
- 🤗deepseek-ai/DeepSeek-R1-0528-Qwen3-8Bmodel· 150k dl· ♡ 1044150k dl♡ 1044
- 🤗unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUFmodel· 15k dl· ♡ 10015k dl♡ 100
Videos
New DeepSeek Research - The Future Is Here!· youtube
Nothing Much Happens in AI, Then Everything Does All At Once· youtube
Taxonomy
MethodsAdam · 1-bit Adam
