Acceleration of multiple precision matrix multiplication based on   multi-component floating-point arithmetic using AVX2

Tomonori Kouya

arXiv:2101.06584·math.NA·September 14, 2021

Acceleration of multiple precision matrix multiplication based on multi-component floating-point arithmetic using AVX2

Tomonori Kouya

PDF

TL;DR

This paper presents a method to accelerate multi-precision matrix multiplication using AVX2 SIMD instructions, achieving over three times speedup by implementing SIMDized error-free transformation functions and efficient memory operations.

Contribution

The paper introduces SIMDized EFT functions for multi-precision arithmetic and demonstrates their effectiveness in accelerating matrix multiplication on x86_64 systems.

Findings

01

Over three times acceleration compared to non-accelerated methods

02

Effective SIMDization of EFT functions for double-double, triple-double, and quad-double arithmetic

03

Enhanced parallelization performance with OpenMP

Abstract

In this paper, we report the results obtained from the acceleration of multi-binary64-type multiple precision matrix multiplication with AVX2. We target double-double (DD), triple-double (TD), and quad-double (QD) precision arithmetic designed by certain types of error-free transformation (EFT) arithmetic. Furthermore, we implement SIMDized EFT functions, which simultaneously compute with four binary64 numbers on x86_64 computing environment, and by using help of them, we also develop SIMDized DD, TD, and QD additions and multiplications. In addition, AVX2 load/store functions were adopted to efficiently speed up reading and storing matrix elements from/to memory. Owing to these combined techniques, our implemented multiple precision matrix multiplications have been accelerated more than three times compared with non-accelerated ones. Our accelerated matrix multiplication modifies the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.