Sparse-on-Dense: Area and Energy-Efficient Computing of Sparse Neural Networks on Dense Matrix Multiplication Accelerators
Hyunsung Yoon, Sungju Ryu, Jae-Joon Kim

TL;DR
This paper proposes an area and energy-efficient method for executing sparse neural networks on dense matrix multiplication accelerators, challenging the idea that dedicated sparse PEs are always better.
Contribution
It demonstrates that using larger dense PEs can be more efficient than sparse PEs for sparse neural network computation within area constraints.
Findings
Larger dense PEs outperform sparse PEs in area and energy efficiency.
The proposed method reduces computation overhead for sparse neural networks.
Dense accelerators can effectively process sparse networks with the right design.
Abstract
As the size of Deep Neural Networks (DNNs) increases dramatically to achieve high accuracy, the DNNs require a large amount of computations and memory footprint. Pruning, which produces a sparse neural network, is one of the solutions to reduce the computational complexity of neural network processing. To maximize the performance of the computations with such compressed data, dedicated sparse neural network accelerators have been introduced, but complex circuits for matching the indices of non-zero inputs/weights cause large overhead in area and power of processing elements (PEs). The sparse PE becomes significantly larger than the dense PE, which raises an interesting question for designers; "Given the area, isn't it better to use larger number of dense PEs despite the low utilization in sparse matrix computations?" In this paper, we show that the answer is "yes", and demonstrate an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
