JITSPMM: Just-in-Time Instruction Generation for Accelerated Sparse   Matrix-Matrix Multiplication

Qiang Fu; Thomas B. Rolinger; and H. Howie Huang

arXiv:2312.05639·cs.DC·December 12, 2023·1 cites

JITSPMM: Just-in-Time Instruction Generation for Accelerated Sparse Matrix-Matrix Multiplication

Qiang Fu, Thomas B. Rolinger, and H. Howie Huang

PDF

Open Access

TL;DR

JITSPMM introduces a just-in-time assembly code generation framework that dynamically optimizes sparse matrix multiplication on multi-core CPUs, overcoming limitations of traditional ahead-of-time compilation and significantly boosting performance.

Contribution

The paper presents JITSPMM, a novel JIT assembly code generation approach that adapts runtime information to optimize SpMM computation, achieving better workload balance and instruction-level parallelism.

Findings

01

JITSPMM outperforms AOT baselines with an average of 3.8x speedup.

02

It surpasses Intel MKL's routine with a 1.4x improvement.

03

The framework effectively reduces memory access and enhances SIMD utilization.

Abstract

Achieving high performance for Sparse MatrixMatrix Multiplication (SpMM) has received increasing research attention, especially on multi-core CPUs, due to the large input data size in applications such as graph neural networks (GNNs). Most existing solutions for SpMM computation follow the aheadof-time (AOT) compilation approach, which compiles a program entirely before it is executed. AOT compilation for SpMM faces three key limitations: unnecessary memory access, additional branch overhead, and redundant instructions. These limitations stem from the fact that crucial information pertaining to SpMM is not known until runtime. In this paper, we propose JITSPMM, a just-in-time (JIT) assembly code generation framework to accelerated SpMM computation on multi-core CPUs with SIMD extensions. First, JITSPMM integrates the JIT assembly code generation technique into three widely-used workload…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Ferroelectric and Negative Capacitance Devices · Low-power high-performance VLSI design