Look-Up mAI GeMM: Increasing AI GeMMs Performance by Nearly 2.5x via   msGeMM

Saeed Maleki

arXiv:2310.06178·cs.PF·October 11, 2023

Look-Up mAI GeMM: Increasing AI GeMMs Performance by Nearly 2.5x via msGeMM

Saeed Maleki

PDF

Open Access

TL;DR

This paper introduces msGeMM, a novel algorithm that significantly accelerates low-precision AI matrix multiplication on GPUs, achieving nearly 2.5 times performance improvement by leveraging specialized CUDA cores with look-up table capabilities.

Contribution

The paper presents msGeMM, a new algorithm that reduces the number of operations in low-precision AI matrix multiplications, enabling faster GPU performance.

Findings

01

Achieves ~2.5x speedup in low-precision AI GeMMs

02

Demonstrates effective use of specialized CUDA cores with look-up tables

03

Validates performance improvements on GPU hardware

Abstract

AI models are increasing in size and recent advancement in the community has shown that unlike HPC applications where double precision datatype are required, lower-precision datatypes such as fp8 or int4 are sufficient to bring the same model quality both for training and inference. Following these trends, GPU vendors such as NVIDIA and AMD have added hardware support for fp16, fp8 and int8 GeMM operations with an exceptional performance via Tensor Cores. However, this paper proposes a new algorithm called msGeMM which shows that AI models with low-precision datatypes can run with ~2.5x fewer multiplication and add instructions. Efficient implementation of this algorithm requires special CUDA cores with the ability to add elements from a small look-up table at the rate of Tensor Cores.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Ferroelectric and Negative Capacitance Devices · Advanced Data Storage Technologies