Activation Density based Mixed-Precision Quantization for Energy Efficient Neural Networks
Karina Vasquez, Yeshwanth Venkatesha, Abhiroop Bhattacharjee, Abhishek, Moitra, Priyadarshini Panda

TL;DR
This paper introduces an in-training mixed-precision quantization method based on Activation Density, reducing energy consumption and training complexity for neural networks on embedded devices, with hardware acceleration support.
Contribution
It proposes a novel AD-based in-training quantization approach that eliminates re-training and achieves significant energy and MAC reductions.
Findings
Achieves ~4.5x MAC reduction with competitive accuracy.
Reduces training complexity by 50%.
Yields ~5x energy savings on PIM hardware.
Abstract
As neural networks gain widespread adoption in embedded devices, there is a need for model compression techniques to facilitate deployment in resource-constrained environments. Quantization is one of the go-to methods yielding state-of-the-art model compression. Most approaches take a fully trained model, apply different heuristics to determine the optimal bit-precision for different layers of the network, and retrain the network to regain any drop in accuracy. Based on Activation Density (AD)-the proportion of non-zero activations in a layer-we propose an in-training quantization method. Our method calculates bit-width for each layer during training yielding a mixed precision model with competitive accuracy. Since we train lower precision models during training, our approach yields the final quantized model at lower training complexity and also eliminates the need for re-training. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsPruning
