Low Complexity Multiply-Accumulate Units for Convolutional Neural Networks with Weight-Sharing
James Garland, David Gregg

TL;DR
This paper introduces a low-complexity multiply-accumulate unit called PASM for weight-shared CNN accelerators, reducing power and area while maintaining performance, suitable for ASIC and FPGA implementations.
Contribution
It proposes the PASM architecture that re-structures MAC operations to count weight frequencies, significantly lowering gate count and power in CNN hardware accelerators.
Findings
Fewer gates and lower power consumption in ASIC implementation.
Comparable latency with traditional MAC units.
Effective FPGA implementation with limited DSP units.
Abstract
Convolutional neural networks (CNNs) are one of the most successful machine learning techniques for image, voice and video processing. CNNs require large amounts of processing capacity and memory bandwidth. Hardware accelerators have been proposed for CNNs which typically contain large numbers of multiply-accumulate (MAC) units, the multipliers of which are large in an integrated circuit (IC) gate count and power consumption. "Weight sharing" accelerators have been proposed where the full range of weight values in a trained CNN are compressed and put into bins and the bin index used to access the weight-shared value. We reduce power and area of the CNN by implementing parallel accumulate shared MAC (PASM) in a weight shared CNN. PASM re-architects the MAC to instead count the frequency of each weight and place it in a bin. The accumulated value is computed in a subsequent multiply…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
