Mixed Low-precision Deep Learning Inference using Dynamic Fixed Point
Naveen Mellempudi, Abhisek Kundu, Dipankar Das, Dheevatsa Mudigere,, and Bharat Kaul

TL;DR
This paper introduces a cluster-based quantization method for deep learning inference that converts full precision weights into ternary weights and constrains activations to 8 bits, enabling efficient low-precision computation with minimal accuracy loss.
Contribution
The authors propose a novel cluster-based quantization approach that effectively reduces multiplications to ternary and 8-bit operations, maintaining high accuracy in deep neural networks.
Findings
Achieves 71.8% TOP-1 accuracy with ResNet-101 using ternary weights and 8-bit activations.
Replaces approximately 85% of multiplications with 8-bit operations, significantly improving efficiency.
Demonstrates that larger cluster sizes increase quantization but reduce accuracy, requiring retraining.
Abstract
We propose a cluster-based quantization method to convert pre-trained full precision weights into ternary weights with minimal impact on the accuracy. In addition, we also constrain the activations to 8-bits thus enabling sub 8-bit full integer inference pipeline. Our method uses smaller clusters of N filters with a common scaling factor to minimize the quantization loss, while also maximizing the number of ternary operations. We show that with a cluster size of N=4 on Resnet-101, can achieve 71.8% TOP-1 accuracy, within 6% of the best full precision results while replacing ~85% of all multiplications with 8-bit accumulations. Using the same method with 4-bit weights achieves 76.3% TOP-1 accuracy which within 2% of the full precision result. We also study the impact of the size of the cluster on both performance and accuracy, larger cluster sizes N=64 can replace ~98% of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization
