TL;DR
This paper introduces an open-source framework that optimizes numerically-tailored matrix multiplication kernels, significantly improving energy efficiency and accuracy in HPC workloads like AI inference and SSH computation without altering user code.
Contribution
The framework automates the generation of customizable systolic matrix multiplication kernels and integrates them seamlessly into existing codebases, enhancing performance and accuracy.
Findings
Reduces energy consumption in AI inference by up to 3.3x.
Improves SSH computation accuracy by over 5x compared to standard IEEE formats.
Achieves significant accuracy per power cost improvements in HPC workloads.
Abstract
We present a versatile open-source framework designed to facilitate efficient, numerically-tailored Matrix-Matrix Multiplications (MMMs). The framework offers two primary contributions: first, a fine-tuned, automated pipeline for arithmetic datapath generation, enabling highly customizable systolic MMM kernels; second, seamless integration of the generated kernels into user code, irrespective of the programming language employed, without necessitating modifications. The framework demonstrates a systematic enhancement in accuracy per energy cost across diverse High Performance Computing (HPC) workloads displaying a variety of numerical requirements, such as Artificial Intelligence (AI) inference and Sea Surface Height (SSH) computation. For AI inference, we consider a set of state-of-the-art neural network models, namely ResNet18, ResNet34, ResNet50, DenseNet121, DenseNet161,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSparse Evolutionary Training
