dCSR: A Memory-Efficient Sparse Matrix Representation for Parallel   Neural Network Inference

Elias Trommer; Bernd Waschneck; Akash Kumar

arXiv:2111.12345·cs.DS·November 25, 2021

dCSR: A Memory-Efficient Sparse Matrix Representation for Parallel Neural Network Inference

Elias Trommer, Bernd Waschneck, Akash Kumar

PDF

Open Access 1 Repo

TL;DR

This paper introduces dCSR, a memory-efficient sparse matrix format optimized for embedded neural network inference on SIMD-capable microcontrollers, achieving significant throughput improvements.

Contribution

We develop Delta-Compressed Storage Row (dCSR), a novel sparse matrix representation that reduces memory overhead and accelerates inference on embedded SIMD hardware.

Findings

01

Achieves up to 2.9x throughput increase for SpMV

02

Provides competitive compression ratios

03

Demonstrates effectiveness on ARM Cortex-M55 MCU

Abstract

Reducing the memory footprint of neural networks is a crucial prerequisite for deploying them in small and low-cost embedded devices. Network parameters can often be reduced significantly through pruning. We discuss how to best represent the indexing overhead of sparse networks for the coming generation of Single Instruction, Multiple Data (SIMD)-capable microcontrollers. From this, we develop Delta-Compressed Storage Row (dCSR), a storage format for sparse matrices that allows for both low overhead storage and fast inference on embedded systems with wide SIMD units. We demonstrate our method on an ARM Cortex-M55 MCU prototype with M-Profile Vector Extension(MVE). A comparison of memory consumption and throughput shows that our method achieves competitive compression ratios and increases throughput over dense methods by up to $2.9 \times$ for sparse matrix-vector multiplication…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

etrommer/dcsr
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Stochastic Gradient Optimization Techniques · Ferroelectric and Negative Capacitance Devices