Quantized Neural Networks: Training Neural Networks with Low Precision   Weights and Activations

Itay Hubara; Matthieu Courbariaux; Daniel Soudry; Ran El-Yaniv and; Yoshua Bengio

arXiv:1609.07061·cs.NE·September 23, 2016·1.4k cites

Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations

Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv and, Yoshua Bengio

PDF

Open Access 5 Repos

TL;DR

This paper presents a method for training quantized neural networks with extremely low precision weights and activations, significantly reducing memory and power consumption while maintaining accuracy.

Contribution

The authors introduce a training approach for quantized neural networks that enables low-precision weights and activations, including 1-bit weights, with comparable accuracy to full-precision models.

Findings

01

QNNs achieve accuracy comparable to 32-bit models on multiple datasets.

02

Quantized matrix multiplication GPU kernel speeds up inference by 7 times.

03

Gradient quantization to 6 bits enables bit-wise gradient computation.

Abstract

We introduce a method to train Quantized Neural Networks (QNNs) --- neural networks with extremely low precision (e.g., 1-bit) weights and activations, at run-time. At train-time the quantized weights and activations are used for computing the parameter gradients. During the forward pass, QNNs drastically reduce memory size and accesses, and replace most arithmetic operations with bit-wise operations. As a result, power consumption is expected to be drastically reduced. We trained QNNs over the MNIST, CIFAR-10, SVHN and ImageNet datasets. The resulting QNNs achieve prediction accuracy comparable to their 32-bit counterparts. For example, our quantized version of AlexNet with 1-bit weights and 2-bit activations achieves $51%$ top-1 accuracy. Moreover, we quantize the parameter gradients to 6-bits as well which enables gradients computation using only bit-wise operation. Quantized…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Advanced Neural Network Applications · Machine Learning and Data Classification

Methods1x1 Convolution · Convolution · Local Response Normalization · Grouped Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Dense Connections · Max Pooling · Softmax · How do I speak to a person at Expedia?-/+/