Acceleration of multiple precision matrix multiplication based on multi-component floating-point arithmetic using AVX2
Tomonori Kouya

TL;DR
This paper presents a method to accelerate multi-precision matrix multiplication using AVX2 SIMD instructions, achieving over three times speedup by implementing SIMDized error-free transformation functions and efficient memory operations.
Contribution
The paper introduces SIMDized EFT functions for multi-precision arithmetic and demonstrates their effectiveness in accelerating matrix multiplication on x86_64 systems.
Findings
Over three times acceleration compared to non-accelerated methods
Effective SIMDization of EFT functions for double-double, triple-double, and quad-double arithmetic
Enhanced parallelization performance with OpenMP
Abstract
In this paper, we report the results obtained from the acceleration of multi-binary64-type multiple precision matrix multiplication with AVX2. We target double-double (DD), triple-double (TD), and quad-double (QD) precision arithmetic designed by certain types of error-free transformation (EFT) arithmetic. Furthermore, we implement SIMDized EFT functions, which simultaneously compute with four binary64 numbers on x86_64 computing environment, and by using help of them, we also develop SIMDized DD, TD, and QD additions and multiplications. In addition, AVX2 load/store functions were adopted to efficiently speed up reading and storing matrix elements from/to memory. Owing to these combined techniques, our implemented multiple precision matrix multiplications have been accelerated more than three times compared with non-accelerated ones. Our accelerated matrix multiplication modifies the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
