Leveraging SIMD for Accelerating Large-number Arithmetic
Subhrajit Das, Abhishek Bichhawat, Yuvraj Patel

TL;DR
This paper introduces DigitsOnTurbo, a SIMD-based approach for large-number arithmetic that restructures computations to achieve significant speedups in scientific and cryptographic applications.
Contribution
It presents a novel data-parallel restructuring of large-number arithmetic algorithms to effectively leverage SIMD, outperforming prior implementations.
Findings
Up to 1.85x speedup for addition and subtraction
Up to 2.3x speedup for multiplication
End-to-end throughput gains of up to 19.3% in scientific computing
Abstract
Large-number arithmetic, widely used in scientific computing and cryptography, has seen limited adoption of single instruction, multiple data (SIMD) parallelism on modern CPUs due to the inherent dependencies in traditional algorithms. We present DigitsOnTurbo (DoT), which restructures the computation around independent, data-parallel operations, rather than vectorizing the standard algorithms, thereby leveraging the benefits provided by SIMD. Over prior SIMD implementations, DoT achieves up to 1.85x speedups for addition and subtraction, and 2.3x for multiplication. When integrated into state-of-the-art libraries, DoT yields up to 4x speedup for addition and subtraction, and up to 2x speedup for multiplication, cascading into end-to-end throughput gains of up to 19.3% for scientific computations, and up to 7.9% latency and 5.9% throughput improvements on cryptographic implementations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
