DARE: An Irregularity-Tolerant Matrix Processing Unit with a Densifying ISA and Filtered Runahead Execution

Xin Yang; Xin Fan; Zengshi Wang; Jun Han

arXiv:2511.15367·cs.AR·November 20, 2025

DARE: An Irregularity-Tolerant Matrix Processing Unit with a Densifying ISA and Filtered Runahead Execution

Xin Yang, Xin Fan, Zengshi Wang, Jun Han

PDF

Open Access

TL;DR

DARE introduces a novel MPU architecture with a densifying ISA and filtered runahead execution, significantly improving performance and energy efficiency for sparse DNN workloads by addressing irregular memory access and compute inefficiencies.

Contribution

It presents DARE, a new MPU design that co-optimizes hardware and algorithms to handle irregular sparse DNN computations more effectively.

Findings

01

Performance improved by up to 4.44×

02

Energy efficiency increased up to 22.8×

03

Achieves 3.91× lower hardware overhead than NVR

Abstract

Deep Neural Networks (DNNs) are widely applied across domains and have shown strong effectiveness. As DNN workloads increasingly run on CPUs, dedicated Matrix Processing Units (MPUs) and Matrix Instruction Set Architectures (ISAs) have been introduced. At the same time, sparsity techniques are widely adopted in algorithms to reduce computational cost. Despite these advances, insufficient hardware-algorithm co-optimization leads to suboptimal performance. On the memory side, sparse DNNs incur irregular access patterns that cause high cache miss rates. While runahead execution is a promising prefetching technique, its direct application to MPUs is often ineffective due to significant prefetch redundancy. On the compute side, stride constraints in current Matrix ISAs prevent the densification of multiple logically related sparse operations, resulting in poor utilization of MPU processing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Big Data and Digital Economy · Advanced Neural Network Applications