TL;DR
S2Engine is a new systolic architecture designed to fully exploit CNN sparsity, significantly improving speed and energy efficiency by transmitting compressed data and enabling dynamic data selection in convolution.
Contribution
It introduces a novel systolic architecture that leverages CNN sparsity through compressed data transmission and dynamic data selection, overcoming previous limitations.
Findings
Achieves 3.2x speedup over naive systolic array.
Attains 3.0x energy efficiency improvement.
Effectively exploits CNN sparsity with compressed data flow.
Abstract
Convolutional neural networks (CNNs) have achieved great success in performing cognitive tasks. However, execution of CNNs requires a large amount of computing resources and generates heavy memory traffic, which imposes a severe challenge on computing system design. Through optimizing parallel executions and data reuse in convolution, systolic architecture demonstrates great advantages in accelerating CNN computations. However, regular internal data transmission path in traditional systolic architecture prevents the systolic architecture from completely leveraging the benefits introduced by neural network sparsity. Deployment of fine-grained sparsity on the existing systolic architectures is greatly hindered by the incurred computational overheads. In this work, we propose S2Engine a novel systolic architecture that can fully exploit the sparsity in CNNs with maximized data reuse.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
