Learning Sparse Low-Precision Neural Networks With Learnable Regularization
Yoojin Choi, Mostafa El-Khamy, Jungwon Lee

TL;DR
This paper introduces a learnable regularization approach to train low-precision neural networks, improving accuracy and compression ratios by aligning high-precision weights with their quantized counterparts.
Contribution
It proposes a novel MSQE regularizer with a learnable coefficient and integrates weight pruning, quantization, and entropy coding for effective low-precision DNN compression.
Findings
Achieved state-of-the-art compression ratios of 7.13 and 6.79 on ImageNet with MobileNet and ShuffleNet.
Produced 8-bit low-precision models for super-resolution with negligible performance loss.
Enhanced training convergence and accuracy of low-precision neural networks.
Abstract
We consider learning deep neural networks (DNNs) that consist of low-precision weights and activations for efficient inference of fixed-point operations. In training low-precision networks, gradient descent in the backward pass is performed with high-precision weights while quantized low-precision weights and activations are used in the forward pass to calculate the loss function for training. Thus, the gradient descent becomes suboptimal, and accuracy loss follows. In order to reduce the mismatch in the forward and backward passes, we utilize mean squared quantization error (MSQE) regularization. In particular, we propose using a learnable regularization coefficient with the MSQE regularizer to reinforce the convergence of high-precision weights to their quantized values. We also investigate how partial L2 regularization can be employed for weight pruning in a similar manner. Finally,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsPruning · 1x1 Convolution · Convolution · Local Response Normalization · Grouped Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Dense Connections · Max Pooling · Softmax
