Acceleration of multi-component multiple-precision arithmetic with branch-free algorithms and SIMD vectorization
Tomonori Kouya

TL;DR
This paper demonstrates how branch-free algorithms combined with SIMD vectorization can significantly speed up multi-component multiple-precision arithmetic on x86 and ARM CPUs, especially for high-precision computations.
Contribution
It introduces and benchmarks branch-free, SIMD-optimized algorithms for multi-component multiple-precision arithmetic on modern CPU architectures.
Findings
Significant acceleration in linear computations and polynomial evaluation.
Effective implementation on both x86 and ARM CPU platforms.
Abstract
Multiple-precision floating-point branch-free algorithms can significantly accelerate multi-component arithmetic implemented by combining hardware-based binary64 and binary32, particularly for triple- and quadruple-precision computations. In this study, we achieved benchmark results on x86 and ARM CPU platforms to quantify the accelerations achieved in linear computations and polynomial evaluation by integrating these algorithms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
