A matrix math facility for Power ISA(TM) processors
Jos\'e E. Moreira, Kit Barton, Steven Battle, Peter Bergner, Ramon, Bertran, Puneeth Bhat, Pedro Caldeira, David Edelsohn, Gordon Fossum, Brad, Frey, Nemanja Ivanovic, Chip Kerchner, Vincent Lim, Shakti Kapoor, Tulio, Machado Filho, Silvia Melitta Mueller, Brett Olsson

TL;DR
The paper introduces new matrix math instructions in Power ISA v3.1, enabling efficient linear algebra operations that significantly boost performance and efficiency in POWER10 processors.
Contribution
It presents the design and implementation of the Matrix-Multiply Assist instructions and demonstrates their impact on processor performance and efficiency.
Findings
Performance per core is 4 times higher than POWER9 at same frequency.
The instructions enable power- and area-efficient matrix computations.
Compiler built-ins effectively leverage the new instructions.
Abstract
Power ISA(TM) Version 3.1 has introduced a new family of matrix math instructions, collectively known as the Matrix-Multiply Assist (MMA) facility. The instructions in this facility implement numerical linear algebra operations on small matrices and are meant to accelerate computation-intensive kernels, such as matrix multiplication, convolution and discrete Fourier transform. These instructions have led to a power- and area-efficient implementation of a high throughput math engine in the future POWER10 processor. Performance per core is 4 times better, at constant frequency, than the previous generation POWER9 processor. We also advocate the use of compiler built-ins as the preferred way of leveraging these instructions, which we illustrate through case studies covering matrix multiplication and convolution.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Numerical Methods and Algorithms · Low-power high-performance VLSI design
MethodsConvolution
