Rosko: Row Skipping Outer Products for Sparse Matrix Multiplication Kernels
Vikas Natesh, Andrew Sabot, H.T. Kung, Mark Ting

TL;DR
Rosko introduces a novel row skipping outer product technique that significantly reduces computation and memory access in sparse matrix multiplication for neural networks, outperforming existing solutions.
Contribution
The paper presents Rosko, a new sparse matrix multiplication kernel that efficiently skips entire rows, reducing computation and memory use without auto-tuning.
Findings
Achieves up to 6.5x runtime reduction on CPUs
Effectively handles sparsities from 65% to 99.8%
Outperforms existing auto-tuning and vendor-optimized libraries
Abstract
We propose Rosko -- row skipping outer products -- for deriving sparse matrix multiplication (SpMM) kernels in reducing computation and memory access requirements of deep neural networks (DNNs). Rosko allows skipping of entire row computations during program execution with low sparsity-management overheads. We analytically derive sparse CPU kernels that adapt to given hardware characteristics to effectively utilize processor cores and minimize data movement without the need for auto-tuning or search space exploration. Rosko can be integrated with other outer product scheduling methods, allowing them to leverage row skipping by using Rosko's packing format to skip unnecessary computation. Rosko kernels outperform existing auto-tuning and search-based solutions as well as state-of-the-art vendor-optimized libraries on real hardware across a variety of neural network workloads. For…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Stochastic Gradient Optimization Techniques · Advanced Neural Network Applications
