Ternary Neural Networks with Fine-Grained Quantization

Naveen Mellempudi; Abhisek Kundu; Dheevatsa Mudigere; Dipankar Das,; Bharat Kaul; Pradeep Dubey

arXiv:1705.01462·cs.LG·May 31, 2017·61 cites

Ternary Neural Networks with Fine-Grained Quantization

Naveen Mellempudi, Abhisek Kundu, Dheevatsa Mudigere, Dipankar Das,, Bharat Kaul, Pradeep Dubey

PDF

Open Access

TL;DR

This paper introduces a fine-grained quantization method for ternarizing neural network weights and constraining activations to 8 and 4 bits, achieving high accuracy with significant reduction in multiplications, enabling efficient low-bit inference.

Contribution

The paper presents a novel FGQ approach for ternarizing pre-trained models with minimal accuracy loss and provides an improved theoretical formulation for better quantization quality.

Findings

01

Achieves within 3.7-4.2% of full precision accuracy on ResNet-50/101.

02

Eliminates up to 75% of multiplications without retraining.

03

Enables a full 8/4-bit inference pipeline with up to 15x performance improvement.

Abstract

We propose a novel fine-grained quantization (FGQ) method to ternarize pre-trained full precision models, while also constraining activations to 8 and 4-bits. Using this method, we demonstrate a minimal loss in classification accuracy on state-of-the-art topologies without additional training. We provide an improved theoretical formulation that forms the basis for a higher quality solution using FGQ. Our method involves ternarizing the original weight tensor in groups of $N$ weights. Using $N = 4$ , we achieve Top-1 accuracy within $3.7%$ and $4.2%$ of the baseline full precision result for Resnet-101 and Resnet-50 respectively, while eliminating $75%$ of all multiplications. These results enable a full 8/4-bit inference pipeline, with best-reported accuracy using ternary weights on ImageNet dataset, with a potential of $9 \times$ improvement in performance. Also, for smaller networks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications

Methods1x1 Convolution · Convolution · Local Response Normalization · Grouped Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Dense Connections · Max Pooling · Softmax · How do I speak to a person at Expedia?-/+/