TL;DR
This paper presents SYQ, a symmetric quantization method that learns codebooks for weight subgroups in neural networks, significantly improving accuracy at very low precisions while maintaining hardware efficiency.
Contribution
Introduces a symmetric quantization approach that learns codebooks for weight subgroups, reducing accuracy loss in low-precision neural network quantization.
Findings
Symmetric quantization improves accuracy for binary and ternary networks.
The method maintains hardware simplicity for low-precision representations.
Empirical results show significant accuracy gains with minimal hardware impact.
Abstract
Inference for state-of-the-art deep neural networks is computationally expensive, making them difficult to deploy on constrained hardware environments. An efficient way to reduce this complexity is to quantize the weight parameters and/or activations during training by approximating their distributions with a limited entry codebook. For very low-precisions, such as binary or ternary networks with 1-8-bit activations, the information loss from quantization leads to significant accuracy degradation due to large gradient mismatches between the forward and backward functions. In this paper, we introduce a quantization method to reduce this loss by learning a symmetric codebook for particular weight subgroups. These subgroups are determined based on their locality in the weight matrix, such that the hardware simplicity of the low-precision representations is preserved. Empirically, we show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
