Acceleration of multi-component multiple-precision arithmetic with branch-free algorithms and SIMD vectorization

Tomonori Kouya

arXiv:2603.14926·cs.MS·May 8, 2026

Acceleration of multi-component multiple-precision arithmetic with branch-free algorithms and SIMD vectorization

Tomonori Kouya

PDF

TL;DR

This paper demonstrates how branch-free algorithms combined with SIMD vectorization can significantly speed up multi-component multiple-precision arithmetic on x86 and ARM CPUs, especially for high-precision computations.

Contribution

It introduces and benchmarks branch-free, SIMD-optimized algorithms for multi-component multiple-precision arithmetic on modern CPU architectures.

Findings

01

Significant acceleration in linear computations and polynomial evaluation.

02

Effective implementation on both x86 and ARM CPU platforms.

Abstract

Multiple-precision floating-point branch-free algorithms can significantly accelerate multi-component arithmetic implemented by combining hardware-based binary64 and binary32, particularly for triple- and quadruple-precision computations. In this study, we achieved benchmark results on x86 and ARM CPU platforms to quantify the accelerations achieved in linear computations and polynomial evaluation by integrating these algorithms.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.