DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open   Language Models

Zhihong Shao; Peiyi Wang; Qihao Zhu; Runxin Xu; Junxiao Song; Xiao Bi,; Haowei Zhang; Mingchuan Zhang; Y.K. Li; Y. Wu; Daya Guo

arXiv:2402.03300·cs.CL·April 30, 2024·68 cites

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi,, Haowei Zhang, Mingchuan Zhang, Y.K. Li, Y. Wu, Daya Guo

PDF

Open Access 5 Repos 10 Models 1 Datasets 1 Video

TL;DR

DeepSeekMath 7B significantly advances mathematical reasoning in language models by leveraging extensive web data and a novel optimization technique, achieving near state-of-the-art performance on the MATH benchmark without external tools.

Contribution

The paper introduces DeepSeekMath 7B, a new pre-trained model that improves mathematical reasoning through targeted data selection and a novel optimization method called GRPO.

Findings

01

Achieves 51.7% on MATH benchmark without external tools.

02

Self-consistency sampling improves performance to 60.9%.

03

Introduces Group Relative Policy Optimization (GRPO) for better reasoning and memory efficiency.

Abstract

Mathematical reasoning poses a significant challenge for language models due to its complex and structured nature. In this paper, we introduce DeepSeekMath 7B, which continues pre-training DeepSeek-Coder-Base-v1.5 7B with 120B math-related tokens sourced from Common Crawl, together with natural language and code data. DeepSeekMath 7B has achieved an impressive score of 51.7% on the competition-level MATH benchmark without relying on external toolkits and voting techniques, approaching the performance level of Gemini-Ultra and GPT-4. Self-consistency over 64 samples from DeepSeekMath 7B achieves 60.9% on MATH. The mathematical reasoning capability of DeepSeekMath is attributed to two key factors: First, we harness the significant potential of publicly available web data through a meticulously engineered data selection pipeline. Second, we introduce Group Relative Policy Optimization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

TheBlueScrubs/TheBlueScrubs-v1
dataset· 15 dl
15 dl

Videos

[GRPO Explained] DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models· youtube

Taxonomy

TopicsMathematics, Computing, and Information Processing · Natural Language Processing Techniques

MethodsAttention Is All You Need · Absolute Position Encodings · Linear Layer · Byte Pair Encoding · Entropy Regularization · Multi-Head Attention · Adam · Residual Connection · Layer Normalization · Dense Connections