Trained Ternary Quantization
Chenzhuo Zhu, Song Han, Huizi Mao, William J. Dally

TL;DR
This paper introduces Trained Ternary Quantization (TTQ), a method to reduce neural network weights to ternary values, significantly decreasing model size with minimal accuracy loss and even improving performance on certain datasets.
Contribution
The paper presents a novel trained quantization approach that learns ternary weights and assignment, enabling efficient, smaller models trained from scratch with minimal accuracy degradation.
Findings
TTQ models are nearly 16x smaller than full-precision models.
TTQ outperforms full-precision models on CIFAR-10 with ResNet architectures.
TTQ achieves higher accuracy than previous ternary models on ImageNet.
Abstract
Deep neural networks are widely used in machine learning applications. However, the deployment of large neural networks models can be difficult to deploy on mobile devices with limited power budgets. To solve this problem, we propose Trained Ternary Quantization (TTQ), a method that can reduce the precision of weights in neural networks to ternary values. This method has very little accuracy degradation and can even improve the accuracy of some models (32, 44, 56-layer ResNet) on CIFAR-10 and AlexNet on ImageNet. And our AlexNet model is trained from scratch, which means it's as easy as to train normal full precision model. We highlight our trained quantization method that can learn both ternary values and ternary assignment. During inference, only ternary values (2-bit weights) and scaling factors are needed, therefore our models are nearly 16x smaller than full-precision models. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
Methods1x1 Convolution · Convolution · Local Response Normalization · Grouped Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Dense Connections · Max Pooling · Softmax · How do I speak to a person at Expedia?-/+/
