Bit-pragmatic Deep Neural Network Computing
J. Albericio, P. Judd, A. Delm\'as, S. Sharify, A. Moshovos

TL;DR
This paper introduces Pragmatic (PRA), a novel DNN accelerator architecture that reduces ineffectual computations by calculating only non-zero terms in multiplications, significantly improving performance and energy efficiency.
Contribution
PRA is the first architecture to exploit zero product terms and excess precision in multiplications for DNNs, enhancing efficiency over existing accelerators.
Findings
PRA achieves 2.6x performance improvement over DaDiaNao (DaDN)
PRA improves energy efficiency by up to 28%
Performance gains persist with 8-bit quantization
Abstract
We quantify a source of ineffectual computations when processing the multiplications of the convolutional layers in Deep Neural Networks (DNNs) and propose Pragmatic (PRA), an architecture that exploits it improving performance and energy efficiency. The source of these ineffectual computations is best understood in the context of conventional multipliers which generate internally multiple terms, that is, products of the multiplicand and powers of two, which added together produce the final product [1]. At runtime, many of these terms are zero as they are generated when the multiplicand is combined with the zero-bits of the multiplicator. While conventional bit-parallel multipliers calculate all terms in parallel to reduce individual product latency, PRA calculates only the non-zero terms using a) on-the-fly conversion of the multiplicator representation into an explicit list of powers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFerroelectric and Negative Capacitance Devices · Advanced Memory and Neural Computing · Advanced Neural Network Applications
