DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code   Intelligence

DeepSeek-AI; Qihao Zhu; Daya Guo; Zhihong Shao; Dejian Yang; Peiyi; Wang; Runxin Xu; Y. Wu; Yukun Li; Huazuo Gao; Shirong Ma; Wangding Zeng; Xiao; Bi; Zihui Gu; Hanwei Xu; Damai Dai; Kai Dong; Liyue Zhang; Yishi Piao; Zhibin; Gou; Zhenda Xie; Zhewen Hao; Bingxuan Wang; Junxiao Song; Deli Chen; Xin Xie,; Kang Guan; Yuxiang You; Aixin Liu; Qiushi Du; Wenjun Gao; Xuan Lu; Qinyu; Chen; Yaohui Wang; Chengqi Deng; Jiashi Li; Chenggang Zhao; Chong Ruan; Fuli; Luo; Wenfeng Liang

arXiv:2406.11931·cs.SE·June 19, 2024·48 cites

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

DeepSeek-AI, Qihao Zhu, Daya Guo, Zhihong Shao, Dejian Yang, Peiyi, Wang, Runxin Xu, Y. Wu, Yukun Li, Huazuo Gao, Shirong Ma, Wangding Zeng, Xiao, Bi, Zihui Gu, Hanwei Xu, Damai Dai, Kai Dong, Liyue Zhang, Yishi Piao, Zhibin, Gou, Zhenda Xie, Zhewen Hao, Bingxuan Wang

PDF

Open Access 1 Repo

TL;DR

DeepSeek-Coder-V2 is an open-source MoE model that rivals GPT-4 Turbo in coding tasks, with expanded language support and longer context, achieved through extensive pre-training and model enhancements.

Contribution

It introduces a large-scale, open-source MoE model with improved coding, reasoning, and language capabilities, surpassing many closed-source models in benchmarks.

Findings

01

Achieves performance comparable to GPT-4 Turbo in code tasks.

02

Supports 338 programming languages and 128K context length.

03

Outperforms closed-source models in coding and math benchmarks.

Abstract

We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-V2, while maintaining comparable performance in general language tasks. Compared to DeepSeek-Coder-33B, DeepSeek-Coder-V2 demonstrates significant advancements in various aspects of code-related tasks, as well as reasoning and general capabilities. Additionally, DeepSeek-Coder-V2 expands its support for programming languages from 86 to 338, while extending the context length from 16K to 128K. In standard benchmark evaluations, DeepSeek-Coder-V2…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

deepseek-ai/deepseek-coder-v2
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topicsvaccines and immunoinformatics approaches