Ternary Neural Networks with Fine-Grained Quantization
Naveen Mellempudi, Abhisek Kundu, Dheevatsa Mudigere, Dipankar Das,, Bharat Kaul, Pradeep Dubey

TL;DR
This paper introduces a fine-grained quantization method for ternarizing neural network weights and constraining activations to 8 and 4 bits, achieving high accuracy with significant reduction in multiplications, enabling efficient low-bit inference.
Contribution
The paper presents a novel FGQ approach for ternarizing pre-trained models with minimal accuracy loss and provides an improved theoretical formulation for better quantization quality.
Findings
Achieves within 3.7-4.2% of full precision accuracy on ResNet-50/101.
Eliminates up to 75% of multiplications without retraining.
Enables a full 8/4-bit inference pipeline with up to 15x performance improvement.
Abstract
We propose a novel fine-grained quantization (FGQ) method to ternarize pre-trained full precision models, while also constraining activations to 8 and 4-bits. Using this method, we demonstrate a minimal loss in classification accuracy on state-of-the-art topologies without additional training. We provide an improved theoretical formulation that forms the basis for a higher quality solution using FGQ. Our method involves ternarizing the original weight tensor in groups of weights. Using , we achieve Top-1 accuracy within and of the baseline full precision result for Resnet-101 and Resnet-50 respectively, while eliminating of all multiplications. These results enable a full 8/4-bit inference pipeline, with best-reported accuracy using ternary weights on ImageNet dataset, with a potential of improvement in performance. Also, for smaller networks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
Methods1x1 Convolution · Convolution · Local Response Normalization · Grouped Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Dense Connections · Max Pooling · Softmax · How do I speak to a person at Expedia?-/+/
