Balanced Quantization: An Effective and Efficient Approach to Quantized   Neural Networks

Shuchang Zhou; Yuzhi Wang; He Wen; Qinyao He; Yuheng Zou

arXiv:1706.07145·cs.CV·June 23, 2017·2 cites

Balanced Quantization: An Effective and Efficient Approach to Quantized Neural Networks

Shuchang Zhou, Yuzhi Wang, He Wen, Qinyao He, Yuheng Zou

PDF

Open Access

TL;DR

This paper introduces a balanced quantization method for neural networks that improves accuracy by addressing distribution imbalances in parameters, without increasing inference computation or training time.

Contribution

It proposes a novel percentile-based recursive partitioning approach for balanced quantization, enhancing QNN performance on standard datasets.

Findings

01

Improved top-5 error rate on ImageNet with 4-bit quantized GoogLeNet

02

Effective for both CNNs and RNNs without extra inference cost

03

Outperforms state-of-the-art QNN methods

Abstract

Quantized Neural Networks (QNNs), which use low bitwidth numbers for representing parameters and performing computations, have been proposed to reduce the computation complexity, storage size and memory usage. In QNNs, parameters and activations are uniformly quantized, such that the multiplications and additions can be accelerated by bitwise operations. However, distributions of parameters in Neural Networks are often imbalanced, such that the uniform quantization determined from extremal values may under utilize available bitwidth. In this paper, we propose a novel quantization method that can ensure the balance of distributions of quantized values. Our method first recursively partitions the parameters by percentiles into balanced bins, and then applies uniform quantization. We also introduce computationally cheaper approximations of percentiles to reduce the computation overhead…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Machine Learning and Data Classification · Adversarial Robustness in Machine Learning

Methods1x1 Convolution · Convolution · Average Pooling · Local Response Normalization · Auxiliary Classifier · Inception Module · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Dense Connections · Max Pooling