CSQ: Growing Mixed-Precision Quantization Scheme with Bi-level Continuous Sparsification
Lirui Xiao, Huanrui Yang, Zhen Dong, Kurt Keutzer, Li Du, Shanghang, Zhang

TL;DR
CSQ introduces a stable, fully-differentiable bi-level continuous sparsification method for mixed-precision quantization in DNNs, enabling efficient search for optimal precision schemes with improved accuracy-efficiency tradeoffs.
Contribution
The paper proposes CSQ, a novel bi-level continuous sparsification approach for stable, differentiable mixed-precision quantization scheme search in neural networks.
Findings
CSQ outperforms previous methods in efficiency-accuracy tradeoff.
It enables dynamic growth and pruning of layer precisions.
Experiments validate improved stability and performance across models and datasets.
Abstract
Mixed-precision quantization has been widely applied on deep neural networks (DNNs) as it leads to significantly better efficiency-accuracy tradeoffs compared to uniform quantization. Meanwhile, determining the exact precision of each layer remains challenging. Previous attempts on bit-level regularization and pruning-based dynamic precision adjustment during training suffer from noisy gradients and unstable convergence. In this work, we propose Continuous Sparsification Quantization (CSQ), a bit-level training method to search for mixed-precision quantization schemes with improved stability. CSQ stabilizes the bit-level mixed-precision training process with a bi-level gradual continuous sparsification on both the bit values of the quantized weights and the bit selection in determining the quantization precision of each layer. The continuous sparsification scheme enables…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
MethodsPruning
