Indirection Stream Semantic Register Architecture for Efficient   Sparse-Dense Linear Algebra

Paul Scheffler; Florian Zaruba; Fabian Schuiki; Torsten Hoefler; Luca; Benini

arXiv:2011.08070·cs.AR·December 15, 2020

Indirection Stream Semantic Register Architecture for Efficient Sparse-Dense Linear Algebra

Paul Scheffler, Florian Zaruba, Fabian Schuiki, Torsten Hoefler, Luca, Benini

PDF

TL;DR

This paper introduces a novel hardware extension for RISC-V to accelerate sparse-dense linear algebra operations, achieving significant speedups and energy efficiency improvements over existing CPU and GPU solutions.

Contribution

It presents a new memory-streaming ISA extension that enhances sparse-dense product computations, enabling high utilization and performance on CPUs and multi-core clusters.

Findings

01

Up to 80% FPU utilization with the hardware extension.

02

Speedups of up to 7.2x on single-core and 5.8x on multi-core clusters.

03

2.8x higher peak FP64 utilization compared to a GTX 1080 Ti GPU.

Abstract

Sparse-dense linear algebra is crucial in many domains, but challenging to handle efficiently on CPUs, GPUs, and accelerators alike; multiplications with sparse formats like CSR and CSF require indirect memory lookups. In this work, we enhance a memory-streaming RISC-V ISA extension to accelerate sparse-dense products through streaming indirection. We present efficient dot, matrix-vector, and matrix-matrix product kernels using our hardware, enabling single-core FPU utilizations of up to 80% and speedups of up to 7.2x over an optimized baseline without extensions. A matrix-vector implementation on a multi-core cluster is up to 5.8x faster and 2.7x more energy-efficient with our kernels than an optimized baseline. We propose further uses for our indirection hardware, such as scatter-gather operations and codebook decoding, and compare our work to state-of-the-art CPU, GPU, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.