Emulation of Complex Matrix Multiplication based on the Chinese Remainder Theorem
Yuki Uchino, Qianxiang Ma, Toshiyuki Imamura, Katsuhisa Ozaki, Patrick Lars Gutsche

TL;DR
This paper introduces high-performance methods for emulating complex matrix multiplication on low-precision hardware using the Chinese Remainder Theorem, significantly improving speed over native routines.
Contribution
It extends the Ozaki-II scheme to complex matrix multiplication, enabling faster emulation on INT8 engines with adjustable accuracy.
Findings
Achieves 4.4--6.5x speedup over cuBLAS for complex matrix multiplication.
Allows higher speed with lower accuracy or higher accuracy with modest additional time.
Potential to serve as a default algorithm for various applications.
Abstract
Modern computing architectures feature low-precision matrix multiplication units that achieve substantially higher throughput than their high-precision counterparts. Motivated by this architectural trend, the emulation of high-precision matrix multiplication using low-precision hardware has attracted significant interest in the high-performance computing community. Ozaki, Uchino, and Imamura proposed the Ozaki-II scheme as a general framework for emulating matrix multiplication. Building on this framework, Uchino, Ozaki, and Imamura developed high-performance and power-efficient techniques for emulating single- and double-precision real matrix multiplication on INT8 matrix engines. Extending this line of research, the present study proposes high-performance emulation methods for single- and double-precision complex matrix multiplication on INT8 matrix engines, based on the Ozaki-II…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCryptography and Residue Arithmetic · Numerical Methods and Algorithms · Parallel Computing and Optimization Techniques
