Activation Density based Mixed-Precision Quantization for Energy   Efficient Neural Networks

Karina Vasquez; Yeshwanth Venkatesha; Abhiroop Bhattacharjee; Abhishek; Moitra; Priyadarshini Panda

arXiv:2101.04354·cs.LG·January 13, 2021

Activation Density based Mixed-Precision Quantization for Energy Efficient Neural Networks

Karina Vasquez, Yeshwanth Venkatesha, Abhiroop Bhattacharjee, Abhishek, Moitra, Priyadarshini Panda

PDF

TL;DR

This paper introduces an in-training mixed-precision quantization method based on Activation Density, reducing energy consumption and training complexity for neural networks on embedded devices, with hardware acceleration support.

Contribution

It proposes a novel AD-based in-training quantization approach that eliminates re-training and achieves significant energy and MAC reductions.

Findings

01

Achieves ~4.5x MAC reduction with competitive accuracy.

02

Reduces training complexity by 50%.

03

Yields ~5x energy savings on PIM hardware.

Abstract

As neural networks gain widespread adoption in embedded devices, there is a need for model compression techniques to facilitate deployment in resource-constrained environments. Quantization is one of the go-to methods yielding state-of-the-art model compression. Most approaches take a fully trained model, apply different heuristics to determine the optimal bit-precision for different layers of the network, and retrain the network to regain any drop in accuracy. Based on Activation Density (AD)-the proportion of non-zero activations in a layer-we propose an in-training quantization method. Our method calculates bit-width for each layer during training yielding a mixed precision model with competitive accuracy. Since we train lower precision models during training, our approach yields the final quantized model at lower training complexity and also eliminates the need for re-training. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsPruning