MQBench: Towards Reproducible and Deployable Model Quantization Benchmark
Yuhang Li, Mingzhu Shen, Jian Ma, Yan Ren, Mingxin Zhao, Qi Zhang,, Ruihao Gong, Fengwei Yu, Junjie Yan

TL;DR
MQBench provides a comprehensive benchmark to evaluate the reproducibility and deployability of model quantization algorithms across various hardware platforms, revealing significant gaps and insights for future research.
Contribution
This work introduces MQBench, the first benchmark to systematically evaluate and analyze quantization algorithms' reproducibility and deployability on multiple hardware platforms.
Findings
Existing algorithms perform similarly on academic benchmarks.
Large accuracy gaps exist in hardware deployment scenarios.
No current algorithm excels across all deployment challenges.
Abstract
Model quantization has emerged as an indispensable technique to accelerate deep learning inference. While researchers continue to push the frontier of quantization algorithms, existing quantization work is often unreproducible and undeployable. This is because researchers do not choose consistent training pipelines and ignore the requirements for hardware deployments. In this work, we propose Model Quantization Benchmark (MQBench), a first attempt to evaluate, analyze, and benchmark the reproducibility and deployability for model quantization algorithms. We choose multiple different platforms for real-world deployments, including CPU, GPU, ASIC, DSP, and evaluate extensive state-of-the-art quantization algorithms under a unified training pipeline. MQBench acts like a bridge to connect the algorithm and the hardware. We conduct a comprehensive analysis and find considerable intuitive or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · CCD and CMOS Imaging Sensors
