Minimizing Area and Energy of Deep Learning Hardware Design Using Collective Low Precision and Structured Compression
Shihui Yin, Gaurav Srivastava, Shreyas K. Venkataramanaiah, Chaitali, Chakrabarti, Visar Berisha, Jae-sun Seo

TL;DR
This paper introduces combined low-precision and structured sparsity techniques during training to minimize the area and energy consumption of deep neural network hardware with minimal accuracy loss.
Contribution
It presents a novel approach integrating structured sparsity and low-precision weights during training for optimized DNN hardware design.
Findings
50X weight memory reduction with CIFAR-10 maintaining accuracy
98.4% accuracy at 20nJ per classification on MNIST
Effective combination of sparsity and low-precision for hardware efficiency
Abstract
Deep learning algorithms have shown tremendous success in many recognition tasks; however, these algorithms typically include a deep neural network (DNN) structure and a large number of parameters, which makes it challenging to implement them on power/area-constrained embedded platforms. To reduce the network size, several studies investigated compression by introducing element-wise or row-/column-/block-wise sparsity via pruning and regularization. In addition, many recent works have focused on reducing precision of activations and weights with some reducing down to a single bit. However, combining various sparsity structures with binarized or very-low-precision (2-3 bit) neural networks have not been comprehensively explored. In this work, we present design techniques for minimum-area/-energy DNN hardware with minimal degradation in accuracy. During training, both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Ferroelectric and Negative Capacitance Devices · Advanced Memory and Neural Computing
MethodsPruning
