A MAC-less Neural Inference Processor Supporting Compressed, Variable Precision Weights
Vincenzo Liguori

TL;DR
This paper presents two novel CNN inference architectures that leverage weight sparsity and compression, enabling variable precision processing and reducing computational and bandwidth demands.
Contribution
It introduces MAC-less architectures that utilize bit-level weight sparsity and compression, supporting variable precision weights with smaller resource requirements.
Findings
Achieved reduced computational complexity and bandwidth usage.
Demonstrated implementation feasibility across different technologies.
Supported variable precision weights with smaller, efficient BLMAC units.
Abstract
This paper introduces two architectures for the inference of convolutional neural networks (CNNs). Both architectures exploit weight sparsity and compression to reduce computational complexity and bandwidth. The first architecture uses multiply-accumulators (MACs) but avoids unnecessary multiplications by skipping zero weights. The second architecture exploits weight sparsity at the level of their bit representation by substituting resource-intensive MACs with much smaller Bit Layer Multiply Accumulators (BLMACs). The use of BLMACs also allows variable precision weights as variable size integers and even floating points. Some details of an implementation of the second architecture are given. Weight compression with arithmetic coding is also discussed as well as bandwidth implications. Finally, some implementation results for a pathfinder design and various technologies are presented.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Neural Networks and Applications · Human Pose and Action Recognition
