A Machine Learning Approach Towards Runtime Optimisation of Matrix Multiplication
Yufan Xia, Marco De La Pierre, Amanda S. Barnard, Giuseppe Maria Junior Barca

TL;DR
This paper introduces ADSALA, a machine learning-based library that dynamically optimizes the number of threads for GEMM matrix multiplication on multi-core systems, achieving significant speedups.
Contribution
It presents a novel approach using machine learning to automatically select optimal threading for GEMM, improving performance on modern HPC architectures.
Findings
Achieved 25-40% speedup over traditional GEMM implementations.
Effective on both Intel Cascade Lake and AMD Zen 3 architectures.
Optimization is particularly beneficial for GEMM tasks within 100 MB memory usage.
Abstract
The GEneral Matrix Multiplication (GEMM) is one of the essential algorithms in scientific computing. Single-thread GEMM implementations are well-optimised with techniques like blocking and autotuning. However, due to the complexity of modern multi-core shared memory systems, it is challenging to determine the number of threads that minimises the multi-thread GEMM runtime. We present a proof-of-concept approach to building an Architecture and Data-Structure Aware Linear Algebra (ADSALA) software library that uses machine learning to optimise the runtime performance of BLAS routines. More specifically, our method uses a machine learning model on-the-fly to automatically select the optimal number of threads for a given GEMM task based on the collected training data. Test results on two different HPC node architectures, one based on a two-socket Intel Cascade Lake and the other on a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Numerical Methods and Algorithms · Low-power high-performance VLSI design
