DeepSeek-V3 Technical Report
DeepSeek-AI, Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu,, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, Damai, Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin,, Fucong Dai, Fuli Luo, Guangbo Hao, Guanting Chen

TL;DR
DeepSeek-V3 is a large, efficient Mixture-of-Experts language model with 671 billion parameters, achieving high performance through innovative architectures and training strategies while maintaining cost-effectiveness and training stability.
Contribution
Introduction of DeepSeek-V3 with novel load balancing and multi-token prediction strategies, setting new standards in large-scale language model training and performance.
Findings
Outperforms open-source models in benchmarks
Achieves comparable results to top closed-source models
Requires only 2.788M GPU hours for training
Abstract
We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. We pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities. Comprehensive evaluations reveal that DeepSeek-V3 outperforms other open-source models and achieves performance comparable to leading closed-source models. Despite its excellent performance, DeepSeek-V3 requires only 2.788M H800…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗deepseek-ai/DeepSeek-V3model· 658k dl· ♡ 4020658k dl♡ 4020
- 🤗deepseek-ai/DeepSeek-V3-0324model· 392k dl· ♡ 3094392k dl♡ 3094
- 🤗zakonitebg2023/ZakonGPTmodel
- 🤗deepseek-ai/DeepSeek-V3-Basemodel· 18k dl· ♡ 168418k dl♡ 1684
- 🤗Gokuldaskumar/deepseekv3model· 4 dl4 dl
- 🤗unsloth/DeepSeek-V3model· 12k dl· ♡ 1312k dl♡ 13
- 🤗unsloth/DeepSeek-V3-bf16model· 160 dl· ♡ 16160 dl♡ 16
- 🤗unsloth/DeepSeek-V3-GGUFmodel· 1.4k dl· ♡ 1361.4k dl♡ 136
- 🤗clydedatastruct/russell-v1model· 1 dl1 dl
- 🤗jobs-git/DeepSeek-V3model· 5 dl5 dl
Videos
DeepSeek V3 - The King is Back…For Free!· youtube
"OpenAI is Not God” - The DeepSeek Documentary on Liang Wenfeng, R1 and What's Next· youtube
Taxonomy
TopicsDistributed and Parallel Computing Systems · Robotics and Automated Systems
MethodsSoftmax · Attention Is All You Need
