Machine-Learning-Driven Runtime Optimization of BLAS Level 3 on Modern   Multi-Core Systems

Yufan Xia; Giuseppe Maria Junior Barca

arXiv:2406.19621·cs.DC·July 1, 2024·1 cites

Machine-Learning-Driven Runtime Optimization of BLAS Level 3 on Modern Multi-Core Systems

Yufan Xia, Giuseppe Maria Junior Barca

PDF

Open Access

TL;DR

This paper introduces a machine learning approach to optimize the number of threads for BLAS Level 3 operations on modern multi-core systems, significantly improving performance over traditional methods.

Contribution

It extends the ADSALA library to predict optimal threading for BLAS Level 3 routines based on matrix size and architecture, demonstrating substantial speedups.

Findings

01

Achieved 1.5 to 3.0x speedups over baseline implementations.

02

Validated on Intel and AMD HPC platforms with MKL and BLIS.

03

Analyzed runtime patterns to understand sources of speedup.

Abstract

BLAS Level 3 operations are essential for scientific computing, but finding the optimal number of threads for multi-threaded implementations on modern multi-core systems is challenging. We present an extension to the Architecture and Data-Structure Aware Linear Algebra (ADSALA) library that uses machine learning to optimize the runtime of all BLAS Level 3 operations. Our method predicts the best number of threads for each operation based on the matrix dimensions and the system architecture. We test our method on two HPC platforms with Intel and AMD processors, using MKL and BLIS as baseline BLAS implementations. We achieve speedups of 1.5 to 3.0 for all operations, compared to using the maximum number of threads. We also analyze the runtime patterns of different BLAS operations and explain the sources of speedup. Our work shows the effectiveness and generality of the ADSALA approach for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIndustrial Vision Systems and Defect Detection

MethodsLib · Attentive Walk-Aggregating Graph Neural Network