DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi,, Haowei Zhang, Mingchuan Zhang, Y.K. Li, Y. Wu, Daya Guo

TL;DR
DeepSeekMath 7B significantly advances mathematical reasoning in language models by leveraging extensive web data and a novel optimization technique, achieving near state-of-the-art performance on the MATH benchmark without external tools.
Contribution
The paper introduces DeepSeekMath 7B, a new pre-trained model that improves mathematical reasoning through targeted data selection and a novel optimization method called GRPO.
Findings
Achieves 51.7% on MATH benchmark without external tools.
Self-consistency sampling improves performance to 60.9%.
Introduces Group Relative Policy Optimization (GRPO) for better reasoning and memory efficiency.
Abstract
Mathematical reasoning poses a significant challenge for language models due to its complex and structured nature. In this paper, we introduce DeepSeekMath 7B, which continues pre-training DeepSeek-Coder-Base-v1.5 7B with 120B math-related tokens sourced from Common Crawl, together with natural language and code data. DeepSeekMath 7B has achieved an impressive score of 51.7% on the competition-level MATH benchmark without relying on external toolkits and voting techniques, approaching the performance level of Gemini-Ultra and GPT-4. Self-consistency over 64 samples from DeepSeekMath 7B achieves 60.9% on MATH. The mathematical reasoning capability of DeepSeekMath is attributed to two key factors: First, we harness the significant potential of publicly available web data through a meticulously engineered data selection pipeline. Second, we introduce Group Relative Policy Optimization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗speakleash/Bielik-Minitron-7B-v3.0-Instructmodel· 3.7k dl· ♡ 173.7k dl♡ 17
- 🤗jingyaogong/minimind-3model· 81 dl· ♡ 181 dl♡ 1
- 🤗yangerine/grpo-baseline-lr1e5-l1model· ♡ 1♡ 1
- 🤗deepseek-ai/deepseek-math-7b-basemodel· 2.4k dl· ♡ 862.4k dl♡ 86
- 🤗deepseek-ai/deepseek-math-7b-instructmodel· 10k dl· ♡ 14910k dl♡ 149
- 🤗deepseek-ai/deepseek-math-7b-rlmodel· 2.7k dl· ♡ 912.7k dl♡ 91
- 🤗MaziyarPanahi/deepseek-math-7b-instruct-GGUFmodel· 362 dl· ♡ 1362 dl♡ 1
- 🤗megamined/deepseek-math-7b-rl-8.0bpw-exl2model· 1 dl· ♡ 11 dl♡ 1
- 🤗megamined/deepseek-math-7b-instruct-8.0bpw-exl2model· 2 dl2 dl
- 🤗blockblockblock/deepseek-math-7b-rl-bpw2.25model· 3 dl3 dl
Videos
[GRPO Explained] DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models· youtube
Taxonomy
TopicsMathematics, Computing, and Information Processing · Natural Language Processing Techniques
MethodsAttention Is All You Need · Absolute Position Encodings · Linear Layer · Byte Pair Encoding · Entropy Regularization · Multi-Head Attention · Adam · Residual Connection · Layer Normalization · Dense Connections
