Quantitative Analysis of Performance Drop in DeepSeek Model Quantization
Enbo Zhao, Yi Shen, Shuming Shi, Jieyun Huang, Zhihao Chen, Ning Wang, Siqi Xiao, Jian Zhang, Kai Wang, Shiguo Lian

TL;DR
This paper evaluates the impact of multi-bitwidth quantization on DeepSeek models, demonstrating that 4-bit quantization preserves performance and introducing a dynamic 3-bit method that enables efficient single-machine deployment.
Contribution
It provides the first comprehensive quantitative analysis of quantization effects on DeepSeek models and proposes a novel dynamic 3-bit quantization method that outperforms traditional approaches.
Findings
4-bit quantization maintains performance close to FP8.
The proposed DQ3_K_M method significantly outperforms traditional Q3_K_M.
DQ3_K_M enables single-machine deployment on high-end GPUs.
Abstract
Recently, there is a high demand for deploying DeepSeek-R1 and V3 locally, possibly because the official service often suffers from being busy and some organizations have data privacy concerns. While single-machine deployment offers infrastructure simplicity, the models' 671B FP8 parameter configuration exceeds the practical memory limits of a standard 8-GPU machine. Quantization is a widely used technique that helps reduce model memory consumption. However, it is unclear what the performance of DeepSeek-R1 and V3 will be after being quantized. This technical report presents the first quantitative evaluation of multi-bitwidth quantization across the complete DeepSeek model spectrum. Key findings reveal that 4-bit quantization maintains little performance degradation versus FP8 while enabling single-machine deployment on standard NVIDIA GPU devices. We further propose DQ3_K_M, a dynamic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Vision Systems and Defect Detection · Distributed and Parallel Computing Systems · Advanced Computing and Algorithms
Methodstravel james
