Tackling the Matrix Multiplication Micro-kernel Generation with Exo
Adri\'an Castell\'o, Julian Bellavita, Grace Dinh, Yuka Ikarashi,, H\'ector Mart\'inez

TL;DR
This paper introduces a method using the Exo compiler to automatically generate high-performance, portable micro-kernels for matrix multiplication, reducing development effort and maintaining or improving performance compared to manual coding.
Contribution
The paper presents a step-by-step procedure for automatic micro-kernel generation with Exo, enhancing portability and performance over traditional manual approaches.
Findings
Generated micro-kernels perform as well or better than hand-crafted ones.
The approach simplifies adapting to new hardware architectures.
Code portability is significantly improved.
Abstract
The optimization of the matrix multiplication (or GEMM) has been a need during the last decades. This operation is considered the flagship of current linear algebra libraries such as BLIS, OpenBLAS, or Intel OneAPI because of its widespread use in a large variety of scientific applications. The GEMM is usually implemented following the GotoBLAS philosophy, which tiles the GEMM operands and uses a series of nested loops for performance improvement. These approaches extract the maximum computational power of the architectures through small pieces of hardware-oriented, high-performance code called micro-kernel. However, this approach forces developers to generate, with a non-negligible effort, a dedicated micro-kernel for each new hardware. In this work, we present a step-by-step procedure for generating micro-kernels with the Exo compiler that performs close to (or even better than)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Low-power high-performance VLSI design · Advanced Data Storage Technologies
