DGEMM on Integer Matrix Multiplication Unit
Hiroyuki Ootomo, Katsuhisa Ozaki, Rio Yokota

TL;DR
This paper explores leveraging integer matrix multiplication units (IMMUs) in deep learning hardware to accelerate high-precision matrix computations, demonstrating significant speedups in HPC and quantum circuit simulations while preserving accuracy.
Contribution
It introduces a method to utilize IMMU for high-precision matrix multiplication, showing advantages over traditional FP16 Tensor Cores and existing schemes.
Findings
Integer Tensor Cores outperform cuBLAS in double-precision matrix multiplication
Quantum circuit simulation accelerates by up to 4.33 times with maintained FP64 accuracy
The Ozaki scheme effectively uses low-precision units for high-precision results.
Abstract
Deep learning hardware achieves high throughput and low power consumption by reducing computing precision and specializing in matrix multiplication. For machine learning inference, fixed-point value computation is commonplace, where the input and output values and the model parameters are quantized. Thus, many processors are now equipped with fast integer matrix multiplication units (IMMU). It is of significant interest to find a way to harness these IMMUs to improve the performance of HPC applications while maintaining accuracy. We focus on the Ozaki scheme, which computes a high-precision matrix multiplication by using lower-precision computing units, and show the advantages and disadvantages of using IMMU. The experiment using integer Tensor Cores shows that we can compute double-precision matrix multiplication faster than cuBLAS and an existing Ozaki scheme implementation on FP16…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Quantum Computing Algorithms and Architecture · Advanced Data Storage Technologies
