Sparse-on-Dense: Area and Energy-Efficient Computing of Sparse Neural Networks on Dense Matrix Multiplication Accelerators

Hyunsung Yoon; Sungju Ryu; Jae-Joon Kim

arXiv:2604.26587·cs.AR·April 30, 2026

Sparse-on-Dense: Area and Energy-Efficient Computing of Sparse Neural Networks on Dense Matrix Multiplication Accelerators

Hyunsung Yoon, Sungju Ryu, Jae-Joon Kim

PDF

TL;DR

This paper proposes an area and energy-efficient method for executing sparse neural networks on dense matrix multiplication accelerators, challenging the idea that dedicated sparse PEs are always better.

Contribution

It demonstrates that using larger dense PEs can be more efficient than sparse PEs for sparse neural network computation within area constraints.

Findings

01

Larger dense PEs outperform sparse PEs in area and energy efficiency.

02

The proposed method reduces computation overhead for sparse neural networks.

03

Dense accelerators can effectively process sparse networks with the right design.

Abstract

As the size of Deep Neural Networks (DNNs) increases dramatically to achieve high accuracy, the DNNs require a large amount of computations and memory footprint. Pruning, which produces a sparse neural network, is one of the solutions to reduce the computational complexity of neural network processing. To maximize the performance of the computations with such compressed data, dedicated sparse neural network accelerators have been introduced, but complex circuits for matching the indices of non-zero inputs/weights cause large overhead in area and power of processing elements (PEs). The sparse PE becomes significantly larger than the dense PE, which raises an interesting question for designers; "Given the area, isn't it better to use larger number of dense PEs despite the low utilization in sparse matrix computations?" In this paper, we show that the answer is "yes", and demonstrate an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.