Speeding up decimal multiplication
Viktor Krapivensky

TL;DR
This paper introduces optimized algorithms for decimal multiplication using NTT, achieving significant speedups, and presents a new cache-efficient matrix transposition method, with insights on prime modulus usage for improved performance.
Contribution
It provides a portable, faster implementation of decimal multiplication with a novel in-place matrix transposition algorithm and analysis of prime modulus strategies.
Findings
3x-5x speedup over mpdecimal library
New cache-efficient in-place matrix transposition algorithm
Using two prime moduli simplifies answer recovery
Abstract
Decimal multiplication is the task of multiplying two numbers in base Specifically, we focus on the number-theoretic transform (NTT) family of algorithms. Using only portable techniques, we achieve a 3x-5x speedup over the mpdecimal library. In this paper we describe our implementation and discuss further possible optimizations. We also present a simple cache-efficient algorithm for in-place or matrix transposition, the need for which arises in the "six-step algorithm" variation of the matrix Fourier algorithm, and which does not seem to be widely known. Another finding is that use of two prime moduli instead of three makes sense even considering the worst case of increasing the size of the input, and makes for simpler answer recovery.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCryptography and Residue Arithmetic · Coding theory and cryptography · Numerical Methods and Algorithms
