Faster truncated integer multiplication

David Harvey

arXiv:1703.00640·cs.SC·August 3, 2023

Faster truncated integer multiplication

David Harvey

PDF

Open Access

TL;DR

This paper introduces faster algorithms for computing specific parts of integer products, reducing computation time to about 75% of full multiplication, under certain algorithmic assumptions.

Contribution

It presents novel algorithms that efficiently compute either the low or high bits of integer products, improving over traditional methods for large integers.

Findings

01

Algorithms achieve approximately 75% of the time of full multiplication.

02

Applicable when integer multiplication relies on cyclic convolution computations.

03

Significantly speeds up partial product computations for large integers.

Abstract

We present new algorithms for computing the low $n$ bits or the high $n$ bits of the product of two $n$ -bit integers. We show that these problems may be solved in asymptotically 75% of the time required to compute the full $2 n$ -bit product, assuming that the underlying integer multiplication algorithm relies on computing cyclic convolutions of real sequences.

Tables1

Table 1. Table 1. Timings for full and truncated products. Values in parentheses indicate ratio of times for truncated vs full product.

$n$	low product		high product		full product	GMP
1 000 000	2.90ms	(0.94)	2.93ms	(0.95)	3.07ms	2.68ms
2 154 434	7.11ms	(1.02)	7.27ms	(1.05)	6.95ms	6.93ms
4 641 588	14.7ms	(0.95)	15.2ms	(0.99)	15.4ms	16.6ms
10 000 000	36.2ms	(0.92)	37.8ms	(0.96)	39.5ms	39.1ms
21 544 346	88.6ms	(0.89)	92.6ms	(0.93)	99.2ms	99.4ms
46 415 888	204ms	(0.90)	210ms	(0.93)	227ms	237ms
100 000 000	504ms	(0.84)	514ms	(0.86)	598ms	553ms
215 443 469	1.25s	(0.91)	1.28s	(0.93)	1.37s	1.35s
464 158 883	2.76s	(0.91)	2.81s	(0.93)	3.03s	3.05s
1 000 000 000	6.08s	(0.86)	6.19s	(0.88)	7.05s	6.93s
2 154 434 690	13.9s	(0.86)	14.2s	(0.88)	16.1s	17.0s
4 641 588 833	33.6s	(1.01)	34.6s	(1.04)	33.4s	38.1s
10 000 000 000	109s	(1.37)	110s	(1.38)	79.8s	81.6s

Equations403

∥ F ∥ : = 0 ⩽ i < N max ∣ F_{i} ∣.

∥ F ∥ : = 0 ⩽ i < N max ∣ F_{i} ∣.

F, G \in 2^{e} R_{p} [X] / (X^{N} - 1) .

F, G \in 2^{e} R_{p} [X] / (X^{N} - 1) .

H_{k}\coloneqq\sum_{i+j\equiv k\bmod N}F_{i}G_{j}\quad\in\mathopen{\big{[}}-2^{2e}N,2^{2e}N\big{]},\qquad 0\leqslant k<N.

H_{k}\coloneqq\sum_{i+j\equiv k\bmod N}F_{i}G_{j}\quad\in\mathopen{\big{[}}-2^{2e}N,2^{2e}N\big{]},\qquad 0\leqslant k<N.

\accentset \wtildesym H \in 2^{2 e + l g N} R_{p} [X] / (X^{N} - 1)

\accentset \wtildesym H \in 2^{2 e + l g N} R_{p} [X] / (X^{N} - 1)

∥ \accentset \wtildesym H - H ∥ < 2^{2 e + l g N - p} .

∥ \accentset \wtildesym H - H ∥ < 2^{2 e + l g N - p} .

u = i = 0 \sum N - 1 u_{i} 2^{ib}, v = i = 0 \sum N - 1 v_{i} 2^{ib},

u = i = 0 \sum N - 1 u_{i} 2^{ib}, v = i = 0 \sum N - 1 v_{i} 2^{ib},

\accentset \wbarsym U (X) : = i = 0 \sum N - 1 u_{i} X^{i}, \accentset \wbarsym V (X) : = i = 0 \sum N - 1 v_{i} X^{i}, \accentset \wbarsym U, \accentset \wbarsym V \in 2^{b} R_{p} [X] / (X^{2 N} - 1) .

\accentset \wbarsym U (X) : = i = 0 \sum N - 1 u_{i} X^{i}, \accentset \wbarsym V (X) : = i = 0 \sum N - 1 v_{i} X^{i}, \accentset \wbarsym U, \accentset \wbarsym V \in 2^{b} R_{p} [X] / (X^{2 N} - 1) .

\accentset \wbarsym W \in 2^{2 b + l g 2 N} R_{p} [X] / (X^{2 N} - 1)

\accentset \wbarsym W \in 2^{2 b + l g 2 N} R_{p} [X] / (X^{2 N} - 1)

∥ \accentset \wbarsym W - \accentset \wbarsym U \accentset \wbarsym V ∥ < 2^{2 b + l g 2 N - p} .

∥ \accentset \wbarsym W - \accentset \wbarsym U \accentset \wbarsym V ∥ < 2^{2 b + l g 2 N - p} .

M (n) = C (2 N, 2 b + l g N + 2) + O (N b) .

M (n) = C (2 N, 2 b + l g N + 2) + O (N b) .

U (X) : = i = 0 \sum N - 1 u_{i} X^{i} \in Z [X], V (X) : = i = 0 \sum N - 1 v_{i} X^{i} \in Z [X],

U (X) : = i = 0 \sum N - 1 u_{i} X^{i} \in Z [X], V (X) : = i = 0 \sum N - 1 v_{i} X^{i} \in Z [X],

W (X) : = U (X) V (X) = i = 0 \sum 2 N - 2 w_{i} X^{i} \in Z [X] .

W (X) : = U (X) V (X) = i = 0 \sum 2 N - 2 w_{i} X^{i} \in Z [X] .

∣ \accentset \wbarsym W_{i} - w_{i} ∣ < 2^{2 b + l g 2 N - p} = \frac{1}{2}

∣ \accentset \wbarsym W_{i} - w_{i} ∣ < 2^{2 b + l g 2 N - p} = \frac{1}{2}

M_{lo} (n) = C (N, 3 b + l g N + 6) + O (N M (b)) .

M_{lo} (n) = C (N, 3 b + l g N + 6) + O (N M (b)) .

M_{hi} (n) = C (N, 3 b + l g N + 9) + O (N M (b)) .

M_{hi} (n) = C (N, 3 b + l g N + 9) + O (N M (b)) .

C (k N, c p) = (k c + o (1)) C (N, p) .

C (k N, c p) = (k c + o (1)) C (N, p) .

2 b + l g N + 2 = (2 + o (1)) b,

2 b + l g N + 2 = (2 + o (1)) b,

M (n) = C (2 N, 2 b + l g N + 2) + O (N b) = (4 + o (1)) C (N, b) .

M (n) = C (2 N, 2 b + l g N + 2) + O (N b) = (4 + o (1)) C (N, b) .

M_{lo} (n) = C (N, 3 b + l g N + 6) + O (N M (b)) = (3 + o (1)) C (N, b) .

M_{lo} (n) = C (N, 3 b + l g N + 6) + O (N M (b)) = (3 + o (1)) C (N, b) .

\frac{M _{lo} ( n )}{M ( n )} = \frac{3}{4} + o (1),

\frac{M _{lo} ( n )}{M ( n )} = \frac{3}{4} + o (1),

b ⩾ 4, N ⩾ 3

b ⩾ 4, N ⩾ 3

W (X) = i = 0 \sum 2 N - 2 w_{i} X^{i} .

W (X) = i = 0 \sum 2 N - 2 w_{i} X^{i} .

A (X) : = X^{N} + 2^{- b} X - 1 \in R [X],

A (X) : = X^{N} + 2^{- b} X - 1 \in R [X],

L (2^{b}) = i = 0 \sum N - 1 w_{i} 2^{ib} .

L (2^{b}) = i = 0 \sum N - 1 w_{i} 2^{ib} .

W (X) = i = 0 \sum 2 N - 2 w_{i} X^{i} = i = 0 \sum N - 1 w_{i} X^{i} + i = 0 \sum N - 2 w_{N + i} X^{N + i} .

W (X) = i = 0 \sum 2 N - 2 w_{i} X^{i} = i = 0 \sum N - 1 w_{i} X^{i} + i = 0 \sum N - 2 w_{N + i} X^{N + i} .

W (X) \equiv i = 0 \sum N - 1 w_{i} X^{i} + i = 0 \sum N - 2 w_{N + i} (1 - 2^{- b} X) X^{i} (mod A (X)) .

W (X) \equiv i = 0 \sum N - 1 w_{i} X^{i} + i = 0 \sum N - 2 w_{N + i} (1 - 2^{- b} X) X^{i} (mod A (X)) .

L (X) = i = 0 \sum N - 1 w_{i} X^{i} + i = 0 \sum N - 2 w_{N + i} (1 - 2^{- b} X) X^{i} .

L (X) = i = 0 \sum N - 1 w_{i} X^{i} + i = 0 \sum N - 2 w_{N + i} (1 - 2^{- b} X) X^{i} .

A (2^{b}) = 2^{N b} + 2^{- b} 2^{b} - 1 = 2^{N b} .

A (2^{b}) = 2^{N b} + 2^{- b} 2^{b} - 1 = 2^{N b} .

4∣ z ∣ ⩽ ∣ z ∣^{3} ⩽ ∣ z ∣^{N} = ∣ z^{N} ∣ = ∣1 - 2^{- b} z ∣ ⩽ 1 + \frac{∣ z ∣}{16} ⩽ 2∣ z ∣,

4∣ z ∣ ⩽ ∣ z ∣^{3} ⩽ ∣ z ∣^{N} = ∣ z^{N} ∣ = ∣1 - 2^{- b} z ∣ ⩽ 1 + \frac{∣ z ∣}{16} ⩽ 2∣ z ∣,

A (z) = z^{N} + 2^{- b} z - 1 = 0, A^{'} (z) = N z^{N - 1} + 2^{- b} = 0,

A (z) = z^{N} + 2^{- b} z - 1 = 0, A^{'} (z) = N z^{N - 1} + 2^{- b} = 0,

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCoding theory and cryptography · Numerical Methods and Algorithms · Cryptography and Residue Arithmetic

Full text

Faster truncated integer multiplication

David Harvey

[email protected]

School of Mathematics and Statistics, University of New South Wales, Sydney NSW 2052, Australia

Abstract.

We present new algorithms for computing the low $n$ bits or the high $n$ bits of the product of two $n$ -bit integers. We show that these problems may be solved in asymptotically $75\%$ of the time required to compute the full $2n$ -bit product, assuming that the underlying integer multiplication algorithm relies on computing cyclic convolutions of sequences of real numbers.

1. Introduction

Let $n\geqslant 1$ and let $u$ and $v$ be integers in the interval $0\leqslant u,v<2^{n}$ . We write $\operatorname{\mathsf{M}}(n)$ for the cost of computing the full product of $u$ and $v$ , which is just the usual $2n$ -bit product $uv$ . Unless otherwise specified, by ‘cost’ we mean the number of bit operations, under a model such as the multitape Turing machine [14].

In this paper we are interested in two types of truncated product. The low product of $u$ and $v$ is the unique integer $w$ in the interval $0\leqslant w<2^{n}$ such that $w\equiv uv\pmod{2^{n}}$ , or in other words, the low $n$ bits of $uv$ . We denote the cost of computing the low product by $\operatorname{\mathsf{M}_{\mathrm{lo}}}(n)$ .

The high product of $u$ and $v$ consists of the high $n$ bits of $uv$ , except that we allow a small error in the lowest bit. More precisely, the high product is defined to be any integer $w$ in the range $0\leqslant w\leqslant 2^{n}$ such that $|uv-2^{n}w|<2^{n}$ . Thus there are at most two possible values for the high product, and an algorithm that computes it is permitted to return either one. We denote the cost of computing the high product by $\operatorname{\mathsf{M}_{\mathrm{hi}}}(n)$ .

There are many applications of truncated products in computer arithmetic. The most obvious example is high-precision arithmetic on real numbers: to compute an $n$ -bit approximation to the product of two real numbers with $n$ -bit mantissae, we may scale by an appropriate power of two to convert the inputs to $n$ -bit integers, and then compute the high product of those integers. Further examples include Barrett’s [1] and Montgomery’s [12] algorithms for modular arithmetic.

It is natural to ask whether a truncated product can be computed more quickly than a full product. This is indeed the case for small $n$ : in the classical quadratic-time regime, we can compute a truncated product in about half the time of a full product, because essentially only half of the $n^{2}$ bit-by-bit products contribute to the desired output.

However, as $n$ grows, and more sophisticated multiplication algorithms are deployed, these savings begin to dissipate. Consider for instance Karatsuba’s algorithm, which has complexity $\operatorname{\mathsf{M}}(n)=O(n^{\alpha})$ for $\alpha=\log 3/\log 2\approx 1.58$ . Mulders showed [13] that Karatsuba’s algorithm may be adapted to obtain bounds for $\operatorname{\mathsf{M}_{\mathrm{hi}}}(n)$ and $\operatorname{\mathsf{M}_{\mathrm{lo}}}(n)$ around $0.81\operatorname{\mathsf{M}}(n)$ . However, it is not known how to reach $0.5\operatorname{\mathsf{M}}(n)$ in this regime.

For much larger values of $n$ , the most efficient integer multiplication algorithms known are based on FFTs (fast Fourier transforms). Currently, the asymptotically fastest such algorithm has complexity $\operatorname{\mathsf{M}}(n)=O(n\log n)$ [10], and it is widely believed that this bound is optimal up to a constant factor.

It has long been thought that the best way to compute a truncated product using FFT-based algorithms is to simply compute the full product and then discard the unwanted part of the output. One might be able to save $O(n)$ bit operations compared to the full product by skipping computations that do not contribute to the desired half of the output, but no bounds of the type $\operatorname{\mathsf{M}_{\mathrm{lo}}}(n)<c\operatorname{\mathsf{M}}(n)$ or $\operatorname{\mathsf{M}_{\mathrm{hi}}}(n)<c\operatorname{\mathsf{M}}(n)$ have been proved for any constant $c<1$ .

For some closely related problems, one can actually prove that it is not possible to do better than computing the full product. For example, in a suitable algebraic model, the multiplicative complexity of any algorithm that computes the low $n$ coefficients of the product of two polynomials of degree less than $n$ is at least $2n-1$ [4, Thm. 17.14], which is the same as the multiplicative complexity of the full product. By analogy, one might expect the same sort of lower bound to apply to truncated integer multiplication.

In this paper we show that this belief is mistaken: we present algorithms that compute high and low products of integers in asymptotically $75\%$ of the time required for a full product. The new algorithms require that the underlying integer multiplication is carried out via a cyclic convolution of sequences of real numbers. This includes any real convolution algorithm based on FFTs, and in particular the $O(n\log n)$ multiplication algorithm of [10].

Unfortunately, because the new methods rely heavily on the archimedean property of $\mathbf{R}$ , we do not yet know how to obtain this 25% reduction in complexity for arbitrary integer multiplication algorithms. In particular, we are currently unable to establish analogous results for integer multiplication algorithms based on FFTs over other rings, such as finite fields [15].

Although we focus on time complexity in this paper, the new techniques also have implications for space complexity. For example, to multiply two floating-point numbers with $n$ -bit mantissae using standard FFT methods, the transform of each multiplicand occupies roughly $4n$ bits of storage. This is true regardless of the type of FFT algorithm used; it holds for FFTs over finite fields just as well as for real or complex FFTs. Using the new methods, the storage required drops to roughly $3n$ bits. This improvement in space complexity may be significant in applications where storage is the bottleneck, such as extremely high-precision evaluation of numerical constants such as $\pi$ . Furthermore, if the computation is I/O-bound, then this may lead directly to a corresponding improvement in computation time. We will not explore this issue further in this paper.

The remainder of the paper is structured as follows. In Section 2 we state our main results precisely, after giving some preliminary definitions. Section 3 presents the new algorithm for the low product, including the proof of correctness and complexity analysis. Section 4 does the same for the high product. Section 5 gives some performance data for an implementation of the new algorithms.

Historical note. An earlier version of this paper pointed out that the new methods for truncated multiplication may be used to design an integer multiplication algorithm having complexity $\operatorname{\mathsf{M}}(n)=O(n\log n\,K^{\log^{*}n})$ with $K=6$ . At the time, this was the best known asymptotic bound for $\operatorname{\mathsf{M}}(n)$ . This result was subsequently superseded by [8] ( $K=4\sqrt{2}\approx 5.66$ ), [9] ( $K=4$ ), and then [10] ( $\operatorname{\mathsf{M}}(n)=O(n\log n)$ ).

2. Setup and statement of results

2.1. Fixed point arithmetic and real convolutions

We write $\lg x$ for $\lceil\log_{2}x\rceil$ . To simplify analysis of numerical error, all algorithms are assumed to work with the following fixed-point representation for real numbers. (See [11, §3] for a more detailed treatment.) Let $p\geqslant 1$ be a precision parameter. We write $\mathbf{R}_{p}$ for the set of real numbers of the form $a/2^{p}$ where $a$ is an integer in the interval $-2^{p}\leqslant a\leqslant 2^{p}$ . Thus $\mathbf{R}_{p}$ models the unit interval $[-1,1]$ , and elements of $\mathbf{R}_{p}$ are represented using $p+O(1)$ bits of storage. For $e\in\mathbf{Z}$ , we write $2^{e}\mathbf{R}_{p}$ for the set of real numbers of the form $2^{e}x$ where $x\in\mathbf{R}_{p}$ . An element of $2^{e}\mathbf{R}_{p}$ is represented simply by its mantissa in $\mathbf{R}_{p}$ ; the exponent $e$ is always known from context, and is not explicitly stored.

We will frequently work with quotient rings of the form $\mathbf{R}[X]/P(X)$ where $P(X)$ is some fixed monic polynomial of positive degree, such as $X^{N}-1$ . If $F\in\mathbf{R}[X]/P(X)$ and $\deg P=N$ , we write $F_{0},\ldots,F_{N-1}$ for the coefficients of $F$ with respect to the standard monomial basis; that is, $F=F_{0}+\cdots+F_{N-1}X^{N-1}\pmod{P(X)}$ . For such $F$ we define a norm

[TABLE]

We write $2^{e}\mathbf{R}_{p}[X]/P(X)$ for the set of polynomials $F\in\mathbf{R}[X]/P(X)$ whose coefficients $F_{0},\ldots,F_{N-1}$ lie in $2^{e}\mathbf{R}_{p}$ ; this is a slight abuse of notation, as $2^{e}\mathbf{R}_{p}$ is not really a ring. Algorithms always represent such a polynomial by its coefficient vector $(F_{0},\ldots,F_{N-1})\in(2^{e}\mathbf{R}_{p})^{N}$ .

We assume that we have available a subroutine Convolution with the following properties. It takes as input two parameters $N\geqslant 2$ and $p\geqslant 1$ , and polynomials

[TABLE]

Let $H\coloneqq FG\in\mathbf{R}[X]/(X^{N}-1)$ ; more explicitly,

[TABLE]

Then Convolution is required to output a polynomial

[TABLE]

such that

[TABLE]

In other words, Convolution computes a $p$ -bit approximation to the cyclic convolution of two real input sequences of length $N$ .

We write $\operatorname{\mathsf{C}}(N,p)$ for the bit complexity of Convolution. We treat this routine as a black box; its precise implementation is not important for our purposes. A typical implementation would execute a real-to-complex FFT for each input sequence, multiply the Fourier coefficients pointwise, and then compute an inverse complex-to-real transform to recover the result. Internally, it should work to precision slightly higher than $p$ to control rounding errors during intermediate computations. (For an explicit error bound, see for example [3, Theorem 3.6].) The routine may completely ignore the exponent parameter $e$ .

2.2. The full product

For completeness, we recall the well-known algorithm that uses Convolution to compute the full product of two $n$ -bit integers (Algorithm 2.1 below). It depends on two parameters: a chunk size $b$ , and a transform length $N$ , where $Nb\geqslant n$ . The idea is to cut the integers into $N$ chunks of $b$ bits, thereby reducing the integer multiplication problem to the problem of multiplying two polynomials in $\mathbf{Z}[X]$ modulo $X^{2N}-1$ .

We will not discuss in this paper the question of optimising the choice of $b$ and $N$ . The optimal choice of $N$ will involve some balance between making $N$ as close to $n/b$ as possible, but also ensuring that $N$ is sufficiently smooth (has only small prime factors) so that FFTs of length $N$ are as efficient as possible. (An alternative approach is to use “truncated FFTs” [18], which eliminates the need to choose a smooth transform length. However, this makes no difference asymptotically. Despite the overlapping terminology, it is not clear whether the new truncated multiplication algorithms can be adapted to the case of truncated FFTs. This is an interesting question for future research.)

Theorem 2.1 (Full product).

Let $n\geqslant 1$ , and let $u$ and $v$ be $n$ -bit integers. Let $b\geqslant 1$ and $N\geqslant 2$ be integers such that $Nb\geqslant n$ . Then Algorithm 2.1 correctly computes the full product of $u$ and $v$ . Assuming that $\lg N=O(b)$ , its complexity is

[TABLE]

Proof.

The condition $Nb\geqslant n$ ensures that the decompositions of $u$ and $v$ into $u_{0},\ldots,u_{N-1}$ and $v_{0},\ldots,v_{N-1}$ in line 2 are legal. Let

[TABLE]

and

[TABLE]

Note that $\mathchoice{\accentset{\displaystyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{U}}{\accentset{\textstyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{U}}{\accentset{\scriptstyle\text{\smash{\raisebox{-3.76735pt}{$ \wbarsym $}}}}{U}}{\accentset{\scriptscriptstyle\text{\smash{\raisebox{-2.69096pt}{$ \wbarsym $}}}}{U}}$ and $\mathchoice{\accentset{\displaystyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{V}}{\accentset{\textstyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{V}}{\accentset{\scriptstyle\text{\smash{\raisebox{-3.76735pt}{$ \wbarsym $}}}}{V}}{\accentset{\scriptscriptstyle\text{\smash{\raisebox{-2.69096pt}{$ \wbarsym $}}}}{V}}$ are the images of $U$ and $V$ in $2^{b}\mathbf{R}_{p}[X]/(X^{2N}-1)$ , and by construction $u=U(2^{b})$ and $v=V(2^{b})$ . Since $W(X)$ has degree at most $2N-2$ , it is determined by its remainder modulo $X^{2N}-1$ . Line 3 computes an approximation $\mathchoice{\accentset{\displaystyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{W}}{\accentset{\textstyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{W}}{\accentset{\scriptstyle\text{\smash{\raisebox{-3.76735pt}{$ \wbarsym $}}}}{W}}{\accentset{\scriptscriptstyle\text{\smash{\raisebox{-2.69096pt}{$ \wbarsym $}}}}{W}}$ to this remainder with

[TABLE]

for each $i$ . The function $\operatorname{round}(\mspace{2.0mu}\cdot\mspace{2.0mu})$ in line 4 rounds its argument to the nearest integer, with ties broken in either direction as convenient. Since $w_{i}\in\mathbf{Z}$ , we deduce that $\operatorname{round}(\mathchoice{\accentset{\displaystyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{W_{i}}}{\accentset{\textstyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{W_{i}}}{\accentset{\scriptstyle\text{\smash{\raisebox{-3.76735pt}{$ \wbarsym $}}}}{W_{i}}}{\accentset{\scriptscriptstyle\text{\smash{\raisebox{-2.69096pt}{$ \wbarsym $}}}}{W_{i}}})=w_{i}$ for each $i$ ; hence line 4 returns $W(2^{b})=U(2^{b})V(2^{b})=uv$ .

The main term in the complexity bound arises from the Convolution call in line 3. The secondary term consists of the splitting step in line 2, which costs $O(b)$ bit operations per coefficient, and the overlap-add procedure in line 4, which requires $O(b+\lg N)=O(b)$ bit operations per coefficient. ∎

2.3. Statement of results

The main results of this paper are the following analogues of Theorem 2.1 for the low product and high product. These are proved in Section 3 and Section 4 respectively.

Theorem 2.2 (Low product).

Let $n\geqslant 1$ , and let $u$ and $v$ be $n$ -bit integers. Let $b\geqslant 4$ and $N\geqslant 3$ be integers such that $Nb\geqslant n$ . Then Algorithm 3.1 (see §3) correctly computes the low product of $u$ and $v$ . Assuming that $\lg N=O(b)$ , its complexity is

[TABLE]

Theorem 2.3 (High product).

Let $n\geqslant 1$ , and let $u$ and $v$ be $n$ -bit integers. Let $b\geqslant 4$ and $N\geqslant 3$ be integers such that $(N+1)b\geqslant n+\lg N+2$ . Then Algorithm 4.1 (see §4) correctly computes the high product of $u$ and $v$ . Assuming that $\lg N=O(b)$ , its complexity is

[TABLE]

Comparing these complexity bounds to Theorem 2.1 (the full product), we observe that the convolution length has dropped from $2N$ to $N$ , but the working precision has increased from roughly $2b$ to roughly $3b$ . To understand the implications for the overall complexity, we need to make further assumptions on the behaviour of $\operatorname{\mathsf{C}}(N,p)$ . We consider two scenarios.

Scenario #1: asymptotic behaviour as $n\to\infty$ . Assume that the transform length $N$ is restricted to suitably smooth values, such as the ultrasmooth numbers defined by Bernstein [2], so that asymptotically 100% of the FFT work is performed using radix-2 transforms. Assume also that the working precision $p$ is exponentially smaller than $N$ , but somewhat larger than $\log N$ , say $p=\Theta(\log^{2}N)$ . Under these assumptions it is reasonable to expect that the complexity of the underlying real convolution is quasi-linear with respect to the the total bit size $Np$ . In particular, for any absolute constants $k\geqslant 1$ and $c>0$ , where $k$ is an integer, we should have

[TABLE]

This is the case for all FFT-based convolution algorithms known to the author.

Now, given some large $n$ , assume that we choose $N$ and $b$ such that $Nb=(1+o(1))n$ and $b=\Theta(\log^{2}n)$ , as is done for example in [11, §6]. Then

[TABLE]

so according to Theorem 2.1 we have

[TABLE]

On the other hand, for the low product, Theorem 2.2 yields

[TABLE]

We conclude that

[TABLE]

justifying our assertion that asymptotically the new low product algorithm saves 25% of the total work compared to the full product. Similar remarks apply to the high product.

Scenario #2: fixed word size. Now let us consider the situation faced by a programmer working on a modern microprocessor with hardware support for a fixed word size, such as the 53-bit double-precision floating point type provided by the IEEE 754 standard. In this setting, the Convolution subroutine takes as input two vectors of coefficients represented by this data type, and computes their cyclic convolution using some sort of FFT, taking direct advantage of the hardware arithmetic. We assume that $b$ is chosen as large as possible so that the FFTs can be performed in this way; for example, under IEEE 754 we would require that $3b+\lg N+\beta_{N}\leqslant 53$ for the low product, where $\beta_{N}$ is an allowance for numerical error. Obviously in this scenario it does not make sense to allow $n\to\infty$ , and it also does not quite make sense to measure complexity by the number of “bit operations”. Instead, $n$ should be restricted to lie in some finite (possibly quite large) range, and a more natural measure of complexity is the number of word operations (ignoring issues such as locality and parallelism).

We claim that it is still reasonable to expect a reduction in complexity close to 25%. To see this, consider a full product computation for a given $n$ , with splitting parameters $N$ and $b$ . Let $N^{\prime}$ and $b^{\prime}$ be the splitting parameters for the corresponding truncated product, for the same value of $n$ . We should choose $b^{\prime}$ around $2b/3$ to ensure that we still take maximum advantage of the available floating-point type. Then we should choose $N^{\prime}$ around $3N/2$ to compensate for the smaller chunks. Now observe that (for large $n$ ) the bulk of the work for the full product consists of FFTs of length $2N$ , but for the truncated products the FFT length is reduced to around $3N/2$ . Since the FFTs run in quasilinear time (i.e., word operations), we expect to see roughly 25% savings.

In practice the expected 25% speedup will be tempered somewhat by the additional linear-time work inherent in the truncated product algorithms, such as the evaluation of $\alpha^{*}$ and $\beta^{*}$ in Algorithm 3.1. The situation is also complicated by the fact that we are constrained to choose smooth transform lengths. Section 5 gives some timings, showing the speedup actually achieved by an implementation.

3. The low product

The aim of this section is to prove Theorem 2.2. Throughout the section we fix integers

[TABLE]

as in the statement of the theorem.

3.1. The cancellation trick

The key to the new low product algorithm is the following simple observation.

Proposition 3.1.

Let $W(X)\in\mathbf{Z}[X]$ with $\deg W\leqslant 2N-2$ , say

[TABLE]

Let $L(X)\in\mathbf{R}[X]$ be the remainder on dividing $W(X)$ by

[TABLE]

with $\deg L<N$ . Then $2^{b}L(X)\in\mathbf{Z}[X]$ and

[TABLE]

Proof.

Write

[TABLE]

Since $X^{N}\equiv 1-2^{-b}X\pmod{A(X)}$ , we have

[TABLE]

The polynomial on the right hand side has degree at most $N-1$ , so we deduce that

[TABLE]

This shows that $2^{b}L(X)\in\mathbf{Z}[X]$ , and the result follows on substituting $X=2^{b}$ . ∎

Later we will apply Proposition 3.1 to a polynomial $W(X)=U(X)V(X)$ analogous to the $W(X)$ encountered earlier in the proof of Theorem 2.1. The proposition shows that after reducing $W(X)$ modulo $A(X)$ and making the substitution $X=2^{b}$ , the $2^{-b}X$ term in $A(X)$ causes the unwanted high-order coefficients of $W(X)$ to disappear; see Figure 1. An alternative point of view is that polynomial multiplication modulo $A(X)$ corresponds roughly to integer multiplication modulo

[TABLE]

To make use of Proposition 3.1 to compute a low product, we must compute $L(X)$ exactly. Note that the coefficients of $L(X)$ lie in $2^{-b}\mathbf{Z}$ rather than $\mathbf{Z}$ . Consequently, to compute $L(X)$ , we must increase the working precision by $b$ bits compared to the precision used in the full product algorithm. This is why the precision parameter in Theorem 2.2 (and Theorem 2.3) is $3b+\lg N+O(1)$ rather than $2b+\lg N+O(1)$ .

3.2. The roots of $A(X)$

In this section we study the complex roots of the special polynomial $A(X)$ introduced in Proposition 3.1. For $r>0$ , let $D_{r}$ denote the open disc $\{z\in\mathbf{C}:|z|<r\}$ .

Lemma 3.2.

The roots of $A(X)$ lie in $D_{2}$ , and they are all simple.

Proof.

If $z\in\mathbf{C}$ is a root of $A(X)$ and $|z|\geqslant 2$ , then (3.1) implies that

[TABLE]

which is impossible.

Any multiple root $z$ of $A(X)$ would have to satisfy

[TABLE]

and hence

[TABLE]

This implies that $z>0$ , contradicting $A^{\prime}(z)=0$ . ∎

Now consider the function

[TABLE]

where $u\mapsto u^{-1/N}$ means the branch that maps $1$ to $1$ .

Lemma 3.3.

The function $\beta(z)$ maps roots of $A(X)$ to roots of $X^{N}-1$ .

Proof.

If $z$ is a root of $A(X)=X^{N}+2^{-b}X-1$ , then

[TABLE]

In fact, $\beta(z)$ always sends a root of $A(X)$ to the root of $X^{N}-1$ nearest to it, but we will not prove this. Figure 2 illustrates the situation for $N=12$ and $b=1$ , showing that the roots of $A(X)$ are very close to those of $X^{N}-1$ . (For $b=2$ the roots are already too close together to distinguish at this scale.)

For any $k\in\mathbf{Z}$ , the binomial theorem implies that $\beta(z)^{k}$ is represented on $D_{2^{b}}$ by the series

[TABLE]

where

[TABLE]

In particular, the first few terms of $\beta(z)$ are given by

[TABLE]

We will need to construct an explicit functional inverse for $\beta(z)$ , in order to map the roots of $X^{N}-1$ back to the corresponding roots of $A(X)$ . Let $\alpha(z)\in z\,\mathbf{R}[[z]]$ be the formal power series inverse of $\beta(z)$ , i.e., so that

[TABLE]

The coefficients of $\alpha(z)$ , and of its powers, are given as follows.

Lemma 3.4.

For any $k\geqslant 0$ we have (formally)

[TABLE]

where $\alpha_{k,0}\coloneqq 1$ and

[TABLE]

In particular, the first few terms of $\alpha(z)$ are

[TABLE]

Proof.

By the Lagrange inversion formula [17, Thm. 5.4.2], for any $n\geqslant k$ we have

[TABLE]

Taking $n\coloneqq k+r$ , for any $r\geqslant 1$ , yields

[TABLE]

*Remark 3.5**.*

It is also possible to write down an explicit formula for $\alpha_{k,r}$ when $k<0$ , but the above argument fails because $k+r$ is zero when $r=-k$ . To handle the $k<0$ case one needs a slightly stronger form of the Lagrange inversion formula; see for example [6, Thm. 2.1.1]. In this paper we only need the case $k\geqslant 0$ .

The next result gives some simple bounds for the coefficients $\alpha_{k,r}$ and $\beta_{k,r}$ .

Lemma 3.6.

For all $r\geqslant 0$ and $0\leqslant k<N$ we have

[TABLE]

Proof.

The bounds are trivial for $r=0$ , so assume that $r\geqslant 1$ . For $\beta_{k,r}$ we have

[TABLE]

For $\alpha_{k,r}$ , observe that

[TABLE]

where $\eta\coloneqq(k+r)/N$ . We have $\eta>0$ and

[TABLE]

since $r\geqslant 1$ and $N\geqslant 3$ (see (3.1)); hence $1\leqslant\lceil\eta\rceil\leqslant r$ . Thus

[TABLE]

Corollary 3.7.

The series for $\alpha(z)$ and $\beta(z)$ converge on $D_{2^{b}}$ , and

[TABLE]

Proof.

We already know that $\beta(z)$ converges on $D_{2^{b}}$ , and the convergence of $\alpha(z)$ on $D_{2^{b}}$ follows from Lemma 3.6. If $|z|<2$ , then

[TABLE]

where the last inequality follows from (3.1). This shows that $\alpha(z)$ maps $D_{2}$ into $D_{3}\subseteq D_{2^{b}}$ . A similar argument shows that $\beta(z)$ maps $D_{2}$ into $D_{3}\subseteq D_{2^{b}}$ . Since both $\alpha(z)$ and $\beta(z)$ map $D_{2}$ into the disc of convergence of the other, and since they are inverses formally, they must be inverse functions in the sense of (3.2). ∎

Corollary 3.8.

The functions $\alpha(z)$ and $\beta(z)$ induce mutually inverse bijections between the roots of $X^{N}-1$ and the roots of $A(X)$ .

Proof.

By Lemma 3.2, the polynomial $A(X)$ has $N$ distinct roots $z_{1},\ldots,z_{N}$ in $D_{2}$ . Lemma 3.3 shows that $\beta(z)$ maps $z_{1},\ldots,z_{N}$ to roots of $X^{N}-1$ , and the images must be distinct because Corollary 3.7 implies that $\beta(z)$ is injective on $D_{2}$ . Since $X^{N}-1$ has exactly $N$ roots, every root of $X^{N}-1$ must be the image of some $z_{i}$ , and then $\alpha(z)$ must map this root back to $z_{i}$ . ∎

3.3. Ring isomorphisms

The aim of this section is construct a pair of mutually inverse ring isomorphisms

[TABLE]

In the main low product algorithm, the role of these maps will be to convert the problem of multiplying two polynomials modulo $A(X)$ into an ordinary cyclic convolution.

The idea of the construction is that for $F\in\mathbf{R}[X]/A(X)$ , we want to define $(\alpha^{*}F)(X)$ to be the composition $F(\alpha(X))$ , regarded as a polynomial modulo ${X^{N}-1}$ , and similarly for $\beta^{*}$ . However, some care is required in interpreting the expression $F(\alpha(X))$ , as $\alpha(z)$ is not a polynomial, but rather a power series. To make this definition precise, we proceed as follows.

For each $r\geqslant 0$ define linear maps

[TABLE]

by the formulas

[TABLE]

These maps satisfy the following norm bounds. (Recall that the norm on polynomials is defined as in (2.1).)

Lemma 3.9.

For any $r\geqslant 0$ and $F\in\mathbf{R}[X]/A(X)$ ,

[TABLE]

Proof.

By definition, $\alpha^{*}_{r}F=X^{r}G$ where

[TABLE]

For any $H\in\mathbf{R}[X]/(X^{N}-1)$ we have $\|XH\|=\|H\|$ , because multiplication by $X$ simply permutes the coefficients cyclically. Applying this observation to $G$ repeatedly, and recalling Lemma 3.6, we find that

[TABLE]

Lemma 3.10.

For any $r\geqslant 0$ and $F\in\mathbf{R}[X]/(X^{N}-1)$ ,

[TABLE]

Proof.

The argument is similar to Lemma 3.9, the main difference being that multiplication by $X$ modulo $A(X)$ is slightly more complicated than a cyclic permutation. Let $H=\sum_{k=0}^{N-1}H_{k}X^{k}\in\mathbf{R}[X]/A(X)$ . Since $X^{N}\equiv 1-2^{-b}X\pmod{A(X)}$ , we have

[TABLE]

so $\|XH\|\leqslant(1+2^{-b})\|H\|\leqslant 2\|H\|$ .

Now, since $\beta^{*}_{r}F=X^{r}G$ where $G\coloneqq\sum_{k=0}^{N-1}\beta_{k,r}F_{k}X^{k}\in\mathbf{R}[X]/A(X)$ , using Lemma 3.6 we obtain

[TABLE]

We may now define the maps $\alpha^{*}$ and $\beta^{*}$ in (3.3) by setting

[TABLE]

Lemma 3.9 and Lemma 3.10 guarantee that these series converge coefficientwise, so $\alpha^{*}$ and $\beta^{*}$ are well-defined, and they are clearly linear maps. Moreover, we immediately obtain the following estimates concerning the partial sums of the series.

Lemma 3.11.

For any $F\in\mathbf{R}[X]/A(X)$ and any integer $\lambda\geqslant 0$ we have

[TABLE]

Proof.

For the first claim, observe that

[TABLE]

by Lemma 3.9 and (3.1). The second estimate is proved in a similar way. ∎

Lemma 3.12.

For any $F\in\mathbf{R}[X]/(X^{N}-1)$ and any integer $\lambda\geqslant 0$ we have

[TABLE]

Proof.

Follows from Lemma 3.10, similarly to the proof of Lemma 3.11. ∎

Now we can establish that $\alpha^{*}F$ and $\beta^{*}F$ behave like the desired compositions $F(\alpha(X))$ and $F(\beta(X))$ .

Lemma 3.13.

Let $F\in\mathbf{R}[X]/A(X)$ , and let $z$ be a root of $X^{N}-1$ . Then

[TABLE]

*Remark 3.14**.*

The expression $F(\alpha(z))$ is well-defined since $\alpha(z)$ is a root of $A(X)$ (see Corollary 3.8).

Proof.

By the definition of $\alpha^{*}_{r}$ , and since $z$ is a root of $X^{N}-1$ , we have

[TABLE]

Thus

[TABLE]

Lemma 3.15.

Let $F\in\mathbf{R}[X]/(X^{N}-1)$ , and let $z$ be a root of $A(X)$ . Then

[TABLE]

Proof.

Similar to the proof of Lemma 3.13. ∎

Corollary 3.16.

The maps $\alpha^{*}$ and $\beta^{*}$ are mutually inverse ring isomorphisms between $\mathbf{R}[X]/A(X)$ and $\mathbf{R}[X]/(X^{N}-1)$ .

Proof.

We have already pointed out that $\alpha^{*}$ and $\beta^{*}$ are linear; to show that they are ring homomorphisms we must show that they also respect multiplication. Lemma 3.13 implies that for any $F,G\in\mathbf{R}[X]/A(X)$ and any root $z$ of $X^{N}-1$ , we have

[TABLE]

Since a polynomial in $\mathbf{R}[X]/(X^{N}-1)$ is determined by its values at the roots of $X^{N}-1$ , this shows that $\alpha^{*}(FG)=(\alpha^{*}F)(\alpha^{*}G)$ , and hence that $\alpha^{*}$ is a ring homomorphism. A similar argument using Lemma 3.15 shows that $\beta^{*}$ is a ring homomorphism.

To show that $\alpha^{*}$ and $\beta^{*}$ are inverses, let $F\in\mathbf{R}[X]/A(X)$ and let $z$ be a root of $A(X)$ . Corollary 3.8 implies that

[TABLE]

Since this holds for all roots of $A(X)$ , we see that $\beta^{*}\alpha^{*}F=F$ . A similar argument shows that $\alpha^{*}\beta^{*}F=F$ for all $F\in\mathbf{R}[X]/(X^{N}-1)$ . ∎

Finally, we have the following two results concerning the complexity of approximating $\alpha^{*}$ and $\beta^{*}$ .

Proposition 3.17 (Approximating $\alpha^{*}$ ).

Given as input $F\in 2^{e}\mathbf{R}_{p}[X]/A(X)$ , we may compute $G\in 2^{e+1}\mathbf{R}_{p}[X]/(X^{N}-1)$ such that

[TABLE]

in $O(N\operatorname{\mathsf{M}}(p))$ bit operations, assuming that $p=O(b)$ .

Note that the output coefficients can indeed be represented in $2^{e+1}\mathbf{R}_{p}$ thanks to the bound $\|\alpha^{*}F\|\leqslant\frac{16}{15}\|F\|$ (Lemma 3.11 with $\lambda=0$ ). A similar remark applies to Proposition 3.18 below (via Lemma 3.12).

Proof.

Let $\lambda\coloneqq\lceil p/b\rceil$ ; the hypothesis $p=O(b)$ implies that $\lambda=O(1)$ . According to Lemma 3.11,

[TABLE]

and

[TABLE]

To compute the desired $G\in 2^{e+1}\mathbf{R}_{p}[X]/(X^{N}-1)$ such that $\|G-\alpha^{*}F\|<2^{e+1-p}$ , it suffices to ensure that $G$ satisfies

[TABLE]

This may be accomplished by simply evaluating the sum $\sum_{r=0}^{\lambda-1}\alpha^{*}_{r}F$ directly from the definition, with a sufficiently high working precision.

In more detail, we first calculate the coefficients $\alpha_{k,r}$ , for each $r=0,\ldots,\lambda-1$ and $k=0,\ldots,N-1$ . Each one requires $O(\lambda)=O(1)$ operations in $\mathbf{R}$ , using the usual formula for the binomial coefficients. Next we compute the coefficients of the polynomials

[TABLE]

and so on, up to $\alpha^{*}_{\lambda-1}F$ . This costs altogether $O(\lambda N)=O(N)$ operations in $\mathbf{R}$ . Taking the sum of these polynomials costs another $O(\lambda N)$ operations in $\mathbf{R}$ . To ensure that (3.4) holds, it suffices to perform all of these operations with a working precision of $p+O(\log\lambda)=p+O(1)$ significant bits. The details of this error analysis are routine and are omitted. Each such addition, multiplication or division in $\mathbf{R}$ costs $O(\operatorname{\mathsf{M}}(p))$ bit operations, leading to the claimed complexity bound. ∎

Proposition 3.18 (Approximating $\beta^{*}$ ).

Given as input $F\in 2^{e}\mathbf{R}_{p}[X]/(X^{N}-1)$ , we may compute $G\in 2^{e+1}\mathbf{R}_{p}[X]/A(X)$ such that

[TABLE]

in $O(N\operatorname{\mathsf{M}}(p))$ bit operations, assuming that $p=O(b)$ .

Proof.

Taking $\lambda\coloneqq\lceil p/(b-1)\rceil$ , the proof proceeds along similar lines to that of Proposition 3.17, replacing the use of Lemma 3.11 by Lemma 3.12. The main difference is that the reductions modulo $A(X)$ lead to slightly more complicated formulas. For example, we have

[TABLE]

The terms with the minus signs are those arising from the $2^{-b}X$ term in $A(X)$ . Overall, there are no more than $O(\lambda^{2})=O(1)$ of these additional terms compared to the proof of Proposition 3.17. ∎

*Remark 3.19**.*

In the estimates given above, such as Lemma 3.6 and Lemma 3.9, we have opted for shorter proofs rather than the sharpest possible bounds. With more effort, one could prove tighter bounds; this might save a few bits in the main algorithm, but does not affect the asymptotic conclusions of the paper. Similar remarks apply to the high product algorithm in Section 4.

3.4. The main algorithm

We are now in a position to state Algorithm 3.1 and prove the main theorem concerning the computation of the low product.

Proof of Theorem 2.2.

As in the proof of Theorem 2.1, let

[TABLE]

so that $u=U(2^{b})$ and $v=V(2^{b})$ , and let

[TABLE]

The polynomials $\mathchoice{\accentset{\displaystyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{U}}{\accentset{\textstyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{U}}{\accentset{\scriptstyle\text{\smash{\raisebox{-3.76735pt}{$ \wbarsym $}}}}{U}}{\accentset{\scriptscriptstyle\text{\smash{\raisebox{-2.69096pt}{$ \wbarsym $}}}}{U}}$ and $\mathchoice{\accentset{\displaystyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{V}}{\accentset{\textstyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{V}}{\accentset{\scriptstyle\text{\smash{\raisebox{-3.76735pt}{$ \wbarsym $}}}}{V}}{\accentset{\scriptscriptstyle\text{\smash{\raisebox{-2.69096pt}{$ \wbarsym $}}}}{V}}$ in line 2 are just the images of $U$ and $V$ in $2^{b}\mathbf{R}_{p}[X]/A(X)$ . Our goal is to compute $L(X)$ , the remainder on dividing $W(X)$ by $A(X)$ , as in Proposition 3.1. By definition this is equal to $\mathchoice{\accentset{\displaystyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{U}}{\accentset{\textstyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{U}}{\accentset{\scriptstyle\text{\smash{\raisebox{-3.76735pt}{$ \wbarsym $}}}}{U}}{\accentset{\scriptscriptstyle\text{\smash{\raisebox{-2.69096pt}{$ \wbarsym $}}}}{U}}\mathchoice{\accentset{\displaystyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{V}}{\accentset{\textstyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{V}}{\accentset{\scriptstyle\text{\smash{\raisebox{-3.76735pt}{$ \wbarsym $}}}}{V}}{\accentset{\scriptscriptstyle\text{\smash{\raisebox{-2.69096pt}{$ \wbarsym $}}}}{V}}$ .

Line 3 computes approximations $\mathchoice{\accentset{\displaystyle\text{\smash{\raisebox{-5.8125pt}{$ \wtildesym $}}}}{U}}{\accentset{\textstyle\text{\smash{\raisebox{-5.8125pt}{$ \wtildesym $}}}}{U}}{\accentset{\scriptstyle\text{\smash{\raisebox{-4.06876pt}{$ \wtildesym $}}}}{U}}{\accentset{\scriptscriptstyle\text{\smash{\raisebox{-2.90625pt}{$ \wtildesym $}}}}{U}}$ and $\mathchoice{\accentset{\displaystyle\text{\smash{\raisebox{-5.8125pt}{$ \wtildesym $}}}}{V}}{\accentset{\textstyle\text{\smash{\raisebox{-5.8125pt}{$ \wtildesym $}}}}{V}}{\accentset{\scriptstyle\text{\smash{\raisebox{-4.06876pt}{$ \wtildesym $}}}}{V}}{\accentset{\scriptscriptstyle\text{\smash{\raisebox{-2.90625pt}{$ \wtildesym $}}}}{V}}$ to $\alpha^{*}\mathchoice{\accentset{\displaystyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{U}}{\accentset{\textstyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{U}}{\accentset{\scriptstyle\text{\smash{\raisebox{-3.76735pt}{$ \wbarsym $}}}}{U}}{\accentset{\scriptscriptstyle\text{\smash{\raisebox{-2.69096pt}{$ \wbarsym $}}}}{U}}$ and $\alpha^{*}\mathchoice{\accentset{\displaystyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{V}}{\accentset{\textstyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{V}}{\accentset{\scriptstyle\text{\smash{\raisebox{-3.76735pt}{$ \wbarsym $}}}}{V}}{\accentset{\scriptscriptstyle\text{\smash{\raisebox{-2.69096pt}{$ \wbarsym $}}}}{V}}$ . Line 4 computes $\mathchoice{\accentset{\displaystyle\text{\smash{\raisebox{-5.8125pt}{$ \wtildesym $}}}}{W}}{\accentset{\textstyle\text{\smash{\raisebox{-5.8125pt}{$ \wtildesym $}}}}{W}}{\accentset{\scriptstyle\text{\smash{\raisebox{-4.06876pt}{$ \wtildesym $}}}}{W}}{\accentset{\scriptscriptstyle\text{\smash{\raisebox{-2.90625pt}{$ \wtildesym $}}}}{W}}$ , an approximation to $\mathchoice{\accentset{\displaystyle\text{\smash{\raisebox{-5.8125pt}{$ \wtildesym $}}}}{U}}{\accentset{\textstyle\text{\smash{\raisebox{-5.8125pt}{$ \wtildesym $}}}}{U}}{\accentset{\scriptstyle\text{\smash{\raisebox{-4.06876pt}{$ \wtildesym $}}}}{U}}{\accentset{\scriptscriptstyle\text{\smash{\raisebox{-2.90625pt}{$ \wtildesym $}}}}{U}}\mathchoice{\accentset{\displaystyle\text{\smash{\raisebox{-5.8125pt}{$ \wtildesym $}}}}{V}}{\accentset{\textstyle\text{\smash{\raisebox{-5.8125pt}{$ \wtildesym $}}}}{V}}{\accentset{\scriptstyle\text{\smash{\raisebox{-4.06876pt}{$ \wtildesym $}}}}{V}}{\accentset{\scriptscriptstyle\text{\smash{\raisebox{-2.90625pt}{$ \wtildesym $}}}}{V}}$ (the cyclic convolution of $\mathchoice{\accentset{\displaystyle\text{\smash{\raisebox{-5.8125pt}{$ \wtildesym $}}}}{U}}{\accentset{\textstyle\text{\smash{\raisebox{-5.8125pt}{$ \wtildesym $}}}}{U}}{\accentset{\scriptstyle\text{\smash{\raisebox{-4.06876pt}{$ \wtildesym $}}}}{U}}{\accentset{\scriptscriptstyle\text{\smash{\raisebox{-2.90625pt}{$ \wtildesym $}}}}{U}}$ and $\mathchoice{\accentset{\displaystyle\text{\smash{\raisebox{-5.8125pt}{$ \wtildesym $}}}}{V}}{\accentset{\textstyle\text{\smash{\raisebox{-5.8125pt}{$ \wtildesym $}}}}{V}}{\accentset{\scriptstyle\text{\smash{\raisebox{-4.06876pt}{$ \wtildesym $}}}}{V}}{\accentset{\scriptscriptstyle\text{\smash{\raisebox{-2.90625pt}{$ \wtildesym $}}}}{V}}$ ). Observe that

[TABLE]

In this calculation we have used the fact that $\alpha^{*}$ is a ring homomorphism (Corollary 3.16), and that $\|FG\|\leqslant N\|F\|\|G\|$ for any $F,G\in\mathbf{R}[X]/(X^{N}-1)$ . By Lemma 3.11 (with $\lambda=0$ ) we have

[TABLE]

so

[TABLE]

Line 5 computes $\mathchoice{\accentset{\displaystyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{W}}{\accentset{\textstyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{W}}{\accentset{\scriptstyle\text{\smash{\raisebox{-3.76735pt}{$ \wbarsym $}}}}{W}}{\accentset{\scriptscriptstyle\text{\smash{\raisebox{-2.69096pt}{$ \wbarsym $}}}}{W}}$ , an approximation to $\beta^{*}\mathchoice{\accentset{\displaystyle\text{\smash{\raisebox{-5.8125pt}{$ \wtildesym $}}}}{W}}{\accentset{\textstyle\text{\smash{\raisebox{-5.8125pt}{$ \wtildesym $}}}}{W}}{\accentset{\scriptstyle\text{\smash{\raisebox{-4.06876pt}{$ \wtildesym $}}}}{W}}{\accentset{\scriptscriptstyle\text{\smash{\raisebox{-2.90625pt}{$ \wtildesym $}}}}{W}}$ . Since $\alpha^{*}$ and $\beta^{*}$ are inverses (Corollary 3.16), Lemma 3.12 implies that

[TABLE]

where the last inequality follows from our choice of $p=3b+\lg N+6$ in line 1.

On the other hand, we know from Proposition 3.1 that $2^{b}L(X)$ has integer coefficients, so we deduce that $\operatorname{round}(2^{b}\mathchoice{\accentset{\displaystyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{W_{i}}}{\accentset{\textstyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{W_{i}}}{\accentset{\scriptstyle\text{\smash{\raisebox{-3.76735pt}{$ \wbarsym $}}}}{W_{i}}}{\accentset{\scriptscriptstyle\text{\smash{\raisebox{-2.69096pt}{$ \wbarsym $}}}}{W_{i}}})=2^{b}L_{i}$ for each $i$ . Therefore the sum in line 6 is equal to $L(2^{b})$ ; by Proposition 3.1 this is equal to

[TABLE]

This congruence also holds modulo $2^{n}$ as $Nb\geqslant n$ .

The main term in the complexity bound (2.2) arises from the Convolution call in line 4. The splitting and overlap-add steps in lines 2 and 6 contribute $O(Nb)$ bit operations, as $\lg N=O(b)$ (by hypothesis), and the invocations of Proposition 3.17 and Proposition 3.18 in lines 3 and 5 contribute another $O(N\operatorname{\mathsf{M}}(b))$ bit operations. ∎

4. The high product

The discussion for the high product runs along similar lines to the low product, with one additional technical complication. The polynomial $B(X)$ that naturally replaces $A(X)$ in the cancellation trick (see Proposition 4.2) has $N$ roots near the roots of $X^{N}-1$ , just like $A(X)$ , but it also has a real root near $2^{b}$ . Some extra work is needed to handle this additional root.

*Remark 4.1**.*

The asymmetry between the high and low products is somewhat mysterious. Perhaps it is related to the fact that in integer arithmetic, carries always propagate towards the most significant bits. The author has so far been unable to find a way of avoiding the annoying additional root.

Throughout this section we continue to assume that (3.1) holds, i.e., that $b\geqslant 4$ and $N\geqslant 3$ .

4.1. The cancellation trick

We begin with a suitable analogue of Proposition 3.1. To motivate our strategy, recall that the cancellation trick for the low product relied on the fact that

[TABLE]

Working modulo $A(X)$ has the effect of shifting the high-order coefficients downwards by $N$ coefficients, while at the same time multiplying them by $1-2^{-b}X$ , so that they will later cancel out when we evaluate at $X=2^{b}$ . For the high product, we want to instead multiply the low-order coefficients by $1-2^{-b}X$ (to make them later cancel out), and simultaneously shift the high-order coefficients downward by $N$ coefficients. We can accomplish these goals together by multiplying by $1-2^{-b}X$ modulo a polynomial $B(X)$ with the property that

[TABLE]

More precisely, we have the following result.

Proposition 4.2.

Let $W(X)\in\mathbf{Z}[X]$ with $\deg W\leqslant 2N$ , say

[TABLE]

Let $H(X)\in\mathbf{R}[X]$ be the remainder on dividing $(1-2^{-b}X)W(X)$ by

[TABLE]

with $\deg H<N+1$ . Then $2^{b}H(X)\in\mathbf{Z}[X]$ and

[TABLE]

Proof.

Write

[TABLE]

Multiplying by $1-2^{-b}X$ , and using the congruence $1-2^{-b}X\equiv X^{-N}\pmod{B(X)}$ , we obtain

[TABLE]

The polynomial on the right hand side has degree at most $N$ , so we deduce that

[TABLE]

This shows that $2^{b}H(X)\in\mathbf{Z}[X]$ , and the result follows on substituting $X=2^{b}$ . ∎

4.2. The roots of $B(X)$

The next result isolates the auxiliary real root of $B(X)$ .

Lemma 4.3.

The polynomial $B(X)$ has a unique real root $\rho$ in the interval

[TABLE]

In particular,

[TABLE]

*Remark 4.4**.*

It turns out that $\rho$ is very close to $2^{b}(1-2^{-Nb})$ , i.e., the midpoint of the interval (4.1). In fact, one can develop a series expansion for $\rho$ , whose first few terms are given by

[TABLE]

but we will not prove this.

Proof.

It is convenient to make the transformation

[TABLE]

Our goal is to show that $P(Y)$ has a unique real root in the interval $(1-2\epsilon,1)$ .

Clearly $P(1)=\epsilon>0$ . We claim that $P(1-2\epsilon)<0$ . To see this, observe that

[TABLE]

From (3.1) we have $\epsilon=2^{-Nb}\leqslant 2^{-4N}$ and then $(1-2\epsilon)^{N}\geqslant(1-2^{-4N+1})^{N}>\frac{1}{2}$ , so indeed $P(1-2\epsilon)<0$ . By the intermediate value theorem, $P(Y)$ has at least one root in $(1-2\epsilon,1)$ .

To prove that there is exactly one root, we will show that $P^{\prime}(Y)>0$ throughout the interval. We have $P^{\prime}(Y)=Y^{N-1}((N+1)Y-N)$ , so it suffices to show that $1-2\epsilon>N/(N+1)$ , i.e., that $2\epsilon<1/(N+1)$ . This is clear as $2\epsilon=2^{-Nb+1}\leqslant 2^{-4N+1}$ .

Finally, (4.2) follows immediately from (4.1), after taking into account (3.1). ∎

We note for future use the identity

[TABLE]

which follows immediately from the fact that $B(\rho)=0$ .

Next consider the polynomial

[TABLE]

The coefficients of $C(X)$ are given explicitly as follows.

Lemma 4.5.

We have

[TABLE]

Proof.

First observe that

[TABLE]

From (4.3) we have $\rho-2^{b}=-2^{b}/\rho^{N}$ and hence

[TABLE]

Lemma 4.6.

The roots of $C(X)$ lie in $D_{2}$ , and they are all simple.

Proof.

If $z$ is a root of $C(X)$ and $|z|\geqslant 1$ , then

[TABLE]

so by (4.2) and (3.1) we obtain

[TABLE]

If $C(X)$ had a multiple root, say $z$ , then $z$ would also be a multiple root of $B(X)$ . This would imply that

[TABLE]

which in turn forces $z=2^{b}N/(N+1)$ (since clearly $z\neq 0$ ). This contradicts the previous paragraph, as $2^{b}N/(N+1)$ does not lie in $D_{2}$ . ∎

Lemma 4.3 and Lemma 4.6 together imply that $B(X)$ has $N+1$ distinct roots, namely, the $N$ roots of $C(X)$ , and the auxiliary root $\rho$ . Figure 3 illustrates the case $N=12$ , $b=1$ .

Now consider the function

[TABLE]

This is the same as the definition of $\beta(z)$ in Section 3.2, except that the exponent $-1/N$ has been replaced by $1/N$ . The roots of $C(X)$ lie well within the domain of definition of $\delta(z)$ . The auxiliary root $\rho$ is also inside the domain, but lies very close to the boundary.

Lemma 4.7.

The function $\delta(z)$ maps roots of $B(X)$ to roots of $X^{N}-1$ .

Proof.

If $z$ is a root of $B(X)$ , then

[TABLE]

Of course, $\delta(z)$ cannot yield a bijection between the roots of $B(X)$ and those of $X^{N}-1$ , as $B(X)$ has too many roots. In a moment we will see that we do get a bijection if we restrict to the roots of $C(X)$ . (It turns out that $\delta(\rho)=1$ , so $\delta$ maps precisely two roots of $B(X)$ to $1$ . We omit the easy proof.)

For any $k\in\mathbf{Z}$ , the function $\delta(z)^{k}$ is represented on $D_{2^{b}}$ by the series

[TABLE]

where

[TABLE]

Again, $\delta_{k,r}$ is identical to $\beta_{k,r}$ , except that $N$ has the opposite sign. The first few terms in the expansion of $\delta(z)$ are

[TABLE]

Let $\gamma(z)\in z\,\mathbf{R}[[z]]$ be the formal series inverse of $\delta(z)$ .

Lemma 4.8.

For any $k\geqslant 0$ we have (formally)

[TABLE]

where $\gamma_{k,0}\coloneqq 1$ and

[TABLE]

In particular, the first few terms of $\gamma(z)$ are

[TABLE]

Proof.

Same as the proof of Lemma 3.4, with $N$ replaced by $-N$ everywhere. ∎

Lemma 4.9.

For all $r\geqslant 0$ and $0\leqslant k<N$ we have

[TABLE]

Proof.

The bound for $\delta_{k,r}$ follows by the same argument used for $\beta_{k,r}$ in the proof of Lemma 3.6. For $\gamma_{k,r}$ , observe that for $r\geqslant 1$ we have

[TABLE]

Stirling’s formula implies that $r^{r}/r!\leqslant e^{r}$ , so since $N\geqslant 3$ we obtain

[TABLE]

*Remark 4.10**.*

The constant $-2$ in the above bound for $\gamma_{k,r}$ ensures that the statement is correct for all $k$ and $r$ , but asymptotically speaking it is not really necessary. In fact one can prove that for any $\epsilon>0$ , there exist $N_{0}$ and $r_{0}$ such that $|\gamma_{k,r}|\leqslant 2^{-r(b-\epsilon)}$ for all $0\leqslant k<N$ , whenever $N\geqslant N_{0}$ and $r\geqslant r_{0}$ .

Corollary 4.11.

The series for $\gamma(z)$ and $\delta(z)$ converge on $D_{2^{b-2}}$ and $D_{2^{b}}$ respectively, and

[TABLE]

Proof.

Same as the proof of Corollary 3.7, first using Lemma 4.9 to show that $\delta(z)$ maps $D_{2}$ into $D_{3}\subseteq D_{2^{b-2}}$ and that $\gamma(z)$ maps $D_{2}$ into $D_{4}\subseteq D_{2^{b}}$ . ∎

Corollary 4.12.

The functions $\gamma(z)$ and $\delta(z)$ induce mutually inverse bijections between the roots of $X^{N}-1$ and the roots of $C(X)$ .

Proof.

Similar to the proof of Corollary 3.8. ∎

4.3. Ring isomorphisms

In this section we will first construct maps

[TABLE]

analogous to the maps $\alpha^{*}$ and $\beta^{*}$ defined in Section 3.3. Note that these maps do not yet take into account the auxiliary root $\rho$ .

For each $r\geqslant 0$ define linear maps

[TABLE]

by the formulas

[TABLE]

As in Section 3.3 we have the following norm bounds.

Lemma 4.13.

For any $r\geqslant 0$ and $F\in\mathbf{R}[X]/C(X)$ ,

[TABLE]

Proof.

Similar to the proof of Lemma 3.9, using Lemma 4.9 to bound the series coefficients. ∎

Lemma 4.14.

For any $r\geqslant 0$ and $F\in\mathbf{R}[X]/(X^{N}-1)$ ,

[TABLE]

Proof.

As in the proof of Lemma 3.10, we must first work out the effect of multiplication by $X$ modulo $C(X)$ . Let $H=\sum_{k=0}^{N-1}H_{k}X^{k}\in\mathbf{R}[X]/C(X)$ . The formula (4.4) implies that

[TABLE]

We have $2^{b}/\rho<2$ and $2^{b}/\rho^{i}<1$ for $i\geqslant 2$ (due to (4.2) and (3.1)), so we find that $\|XH\|\leqslant 2\|H\|$ .

The rest of the argument is the same as the proof of Lemma 3.10, noting that $\delta_{r}^{*}F=X^{r}G$ for $G\coloneqq\sum_{k=0}^{N-1}\delta_{k,r}F_{k}X^{k}\in\mathbf{R}[X]/C(X)$ , and using Lemma 4.9. ∎

We now define $\gamma^{*}$ and $\delta^{*}$ by setting

[TABLE]

The next five statements are proved along the same lines as the corresponding results in Section 3.3, i.e., from Lemma 3.11 up to Corollary 3.16.

Lemma 4.15.

For any $F\in\mathbf{R}[X]/C(X)$ and any integer $\lambda\geqslant 0$ , we have

[TABLE]

Lemma 4.16.

For any $F\in\mathbf{R}[X]/(X^{N}-1)$ and any integer $\lambda\geqslant 0$ , we have

[TABLE]

Lemma 4.17.

Let $F\in\mathbf{R}[X]/C(X)$ , and let $z$ be a root of $X^{N}-1$ . Then

[TABLE]

Lemma 4.18.

Let $F\in\mathbf{R}[X]/(X^{N}-1)$ , and let $z$ be a root of $C(X)$ . Then

[TABLE]

Corollary 4.19.

The maps $\gamma^{*}$ and $\delta^{*}$ are mutually inverse ring isomorphisms between $\mathbf{R}[X]/C(X)$ and $\mathbf{R}[X]/(X^{N}-1)$ .

Now we bring $\rho$ back into the picture. We will define maps

[TABLE]

in terms of $\gamma^{*}$ and $\delta^{*}$ , as follows.

First, for $F\in\mathbf{R}[X]/B(X)$ , we define

[TABLE]

The map $\gamma^{\dagger}$ is a linear isomorphism, thanks to the Chinese remainder theorem applied to the relatively prime moduli $C(X)$ and $X-\rho$ . However, $\gamma^{\dagger}$ is not quite a ring isomorphism, i.e., is not multiplicative, due to the scaling factor $\rho^{-N}$ . Note that the second component of $\gamma^{\dagger}F$ may be written more explicitly as follows: if $F=F_{0}+F_{1}X+\cdots+F_{N}X^{N}$ , then

[TABLE]

In the other direction, for

[TABLE]

we define $\delta^{\dagger}(F,\theta)$ to be the unique polynomial $G\in\mathbf{R}[X]/B(X)$ such that

[TABLE]

Again, $\delta^{\dagger}$ is a linear isomorphism, but not a ring isomorphism.

In fact, $\gamma^{\dagger}$ and $\delta^{\dagger}$ are not even inverse to each other. Instead, they have been cooked up to satisfy the following relation.

Lemma 4.20.

For any $F,G\in\mathbf{R}[X]/B(X)$ we have

[TABLE]

(The reason for including the factor $1-2^{-b}X$ is to prepare for the use of Proposition 4.2 in the main high product algorithm.)

Proof.

It is enough to check the equality modulo $C(X)$ and modulo $X-\rho$ . It holds modulo $C(X)$ according to Corollary 4.19:

[TABLE]

It holds modulo $X-\rho$ thanks to (4.3):

[TABLE]

Define a norm on $\mathbf{R}[X]/(X^{N}-1)\oplus\mathbf{R}$ by taking

[TABLE]

Then $\gamma^{\dagger}$ and $\delta^{\dagger}$ satisfy the following norm bounds.

Lemma 4.21.

For any $F\in\mathbf{R}[X]/B(X)$ we have

[TABLE]

Proof.

Let $F=F_{0}+\cdots+F_{N}X^{N}\in\mathbf{R}[X]/B(X)$ . Using (4.4), we find that the reduction of $F$ modulo $C(X)$ is given by

[TABLE]

so by (4.2) we obtain

[TABLE]

Lemma 4.15 (with $\lambda=0$ ) then yields

[TABLE]

We also have

[TABLE]

Together these inequalities show that $\|\gamma^{\dagger}F\|\leqslant 3\|F\|$ . ∎

Lemma 4.22.

For any $(F,\theta)\in\mathbf{R}[X]/(X^{N}-1)\oplus\mathbf{R}$ we have

[TABLE]

Proof.

We may write down an explicit formula for $\delta^{\dagger}(F,\theta)$ as follows. Let

[TABLE]

let $\mathchoice{\accentset{\displaystyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{H}}{\accentset{\textstyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{H}}{\accentset{\scriptstyle\text{\smash{\raisebox{-3.76735pt}{$ \wbarsym $}}}}{H}}{\accentset{\scriptscriptstyle\text{\smash{\raisebox{-2.69096pt}{$ \wbarsym $}}}}{H}}\in\mathbf{R}[X]$ be the unique polynomial of degree less than $N$ such that $\mathchoice{\accentset{\displaystyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{H}}{\accentset{\textstyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{H}}{\accentset{\scriptstyle\text{\smash{\raisebox{-3.76735pt}{$ \wbarsym $}}}}{H}}{\accentset{\scriptscriptstyle\text{\smash{\raisebox{-2.69096pt}{$ \wbarsym $}}}}{H}}\equiv H\pmod{C(X)}$ , and let

[TABLE]

Then we claim that

[TABLE]

To prove (4.7), it suffices to verify that it holds modulo $C(X)$ and modulo $X-\rho$ . For $C(X)$ this is clear as $\delta^{\dagger}(F,\theta)\equiv(1-2^{-b}X)\delta^{*}F\pmod{C(X)}$ by the definition of $\delta^{\dagger}$ . It holds modulo $X-\rho$ because by (4.3) and the definition of $\delta^{\dagger}$ we have

[TABLE]

Our goal is now to estimate the size of the coefficients of the polynomial

[TABLE]

Note that $J(X)$ has degree at most $N$ , so its coefficients are exactly the same as those of $\delta^{\dagger}(F,\theta)$ .

Let us first estimate $|\psi|$ . Write $H(X)=H_{0}+\cdots+H_{N-1}X^{N-1}$ , so that also $\mathchoice{\accentset{\displaystyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{H}}{\accentset{\textstyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{H}}{\accentset{\scriptstyle\text{\smash{\raisebox{-3.76735pt}{$ \wbarsym $}}}}{H}}{\accentset{\scriptscriptstyle\text{\smash{\raisebox{-2.69096pt}{$ \wbarsym $}}}}{H}}(X)=H_{0}+\cdots+H_{N-1}X^{N-1}$ . We have

[TABLE]

By Lemma 4.16 (with $\lambda=0$ ) we have $\|H\|=\|\delta^{*}F\|\leqslant\frac{8}{7}\|F\|$ . Using (4.2) and (3.1) we obtain

[TABLE]

From (4.4) and (4.2) we have

[TABLE]

Therefore

[TABLE]

Since $N\geqslant 3$ and $\rho>15$ (again from (4.2) and (3.1)), we find that

[TABLE]

so

[TABLE]

Now we may estimate the size of the coefficients of $J(X)$ . The coefficients of $(1-2^{-b}X)\mathchoice{\accentset{\displaystyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{H}}{\accentset{\textstyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{H}}{\accentset{\scriptstyle\text{\smash{\raisebox{-3.76735pt}{$ \wbarsym $}}}}{H}}{\accentset{\scriptscriptstyle\text{\smash{\raisebox{-2.69096pt}{$ \wbarsym $}}}}{H}}(X)$ are bounded in absolute value by $(1+2^{-b})\|H\|$ , and from (4.4) we see that the coefficients of $\psi\,C(X)$ are bounded in absolute value by $(2^{b}/\rho)|\psi|$ . Therefore, using again (3.1) and (4.2) we find that the coefficients of $J(X)$ are bounded in absolute value by

[TABLE]

Next, we exhibit efficient algorithms for approximating $\gamma^{\dagger}$ and $\delta^{\dagger}$ .

Proposition 4.23 (Approximating $\gamma^{\dagger}$ ).

Given as input $F\in 2^{e}\mathbf{R}_{p}[X]/B(X)$ , we may compute

[TABLE]

such that

[TABLE]

in $O(N\operatorname{\mathsf{M}}(p))$ bit operations, assuming that $p=O(b)$ .

Proof.

We first remark that since $p=O(b)$ , we may precompute $\rho$ to a precision of $p+O(1)$ significant bits using only $O(\log N)$ operations in $\mathbf{R}$ , by using Newton’s method to numerically solve the equation $B(z)=0$ , starting with the initial approximation $z=2^{b}$ . (See also Remark 4.25 below.)

Now, given as input $F\in 2^{e}\mathbf{R}_{p}[X]/B(X)$ as above, we first compute an approximation to $F\bmod C(X)$ using the formula (4.5). The hypothesis $p=O(b)$ , together with the rapid decay of the coefficients of $C(X)$ , implies that this may be done using $O(1)$ operations in $\mathbf{R}$ . We may then compute the desired approximation $G$ to $\gamma^{*}(F\bmod C(X))$ using the same method as in the proof of Proposition 3.17, at a cost of $O(N)$ operations in $\mathbf{R}$ . Finally, we may easily compute the desired approximation $\theta$ to $\rho^{-N}F(\rho)=F_{N}+\rho^{-1}F_{N-1}+\cdots$ using another $O(1)$ operations in $\mathbf{R}$ . ∎

Proposition 4.24 (Approximating $\delta^{\dagger}$ ).

Given as input $F\in 2^{e}\mathbf{R}_{p}[X]/(X^{N}-1)$ and $\theta\in 2^{e}\mathbf{R}_{p}$ , we may compute

[TABLE]

such that

[TABLE]

in $O(N\operatorname{\mathsf{M}}(p))$ bit operations, assuming that $p=O(b)$ .

Proof.

The algorithm amounts to evaluating the explicit formula (4.7). We first approximate $H=\delta^{*}F$ using the same method as in the proof of Proposition 3.18, at a cost of $O(N)$ operations in $\mathbf{R}$ . (This requires $O(1)$ more operations than the corresponding algorithm for $\beta^{*}$ , because the reductions modulo $C(X)$ involve a few more terms than those modulo $A(X)$ .) We then approximate $\psi$ at a cost of $O(1)$ operations, and evaluate (4.7) in another $O(N)$ operations. ∎

*Remark 4.25**.*

In practice we always have $Nb\gg p$ , and this assumption allows several simplifications to be made to the algorithms in Proposition 4.23 and Proposition 4.24. First, the trivial approximation $\rho\approx 2^{b}$ is already correct to $p+O(1)$ significant bits, so Newton’s method is not required. In addition, we have $C(\rho)\approx\rho^{N}\approx 2^{Nb}$ , so instead of the complicated formula (4.6) for $\psi$ , we may simply use the approximation $\psi\approx\theta$ .

Finally we may state the main high product algorithm, and prove the main theorem concerning its correctness and complexity.

Proof of Theorem 2.3.

Line 2 decomposes $u$ and $v$ into $N+1$ chunks of $b$ bits. The splitting boundaries are different to those used for the full product and low product: here $u_{N}$ consists of the $b$ most significant bits of $u$ , then $u_{N-1}$ the next lower $b$ bits, and so on. The hypothesis $N+1\geqslant n/b$ ensures that this splitting is possible.

As in the proof of Theorem 2.1, let

[TABLE]

so that

[TABLE]

Let

[TABLE]

The polynomials $\mathchoice{\accentset{\displaystyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{U}}{\accentset{\textstyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{U}}{\accentset{\scriptstyle\text{\smash{\raisebox{-3.76735pt}{$ \wbarsym $}}}}{U}}{\accentset{\scriptscriptstyle\text{\smash{\raisebox{-2.69096pt}{$ \wbarsym $}}}}{U}}$ and $\mathchoice{\accentset{\displaystyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{V}}{\accentset{\textstyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{V}}{\accentset{\scriptstyle\text{\smash{\raisebox{-3.76735pt}{$ \wbarsym $}}}}{V}}{\accentset{\scriptscriptstyle\text{\smash{\raisebox{-2.69096pt}{$ \wbarsym $}}}}{V}}$ in line 2 are just the images of $U$ and $V$ in $2^{b}\mathbf{R}_{p}[X]/B(X)$ . Our goal is to compute $H(X)$ , the remainder on dividing $(1-2^{-b}X)W(X)$ by $B(X)$ , as in Proposition 4.2. By definition this is equal to $(1-2^{-b}X)\mathchoice{\accentset{\displaystyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{U}}{\accentset{\textstyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{U}}{\accentset{\scriptstyle\text{\smash{\raisebox{-3.76735pt}{$ \wbarsym $}}}}{U}}{\accentset{\scriptscriptstyle\text{\smash{\raisebox{-2.69096pt}{$ \wbarsym $}}}}{U}}\mathchoice{\accentset{\displaystyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{V}}{\accentset{\textstyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{V}}{\accentset{\scriptstyle\text{\smash{\raisebox{-3.76735pt}{$ \wbarsym $}}}}{V}}{\accentset{\scriptscriptstyle\text{\smash{\raisebox{-2.69096pt}{$ \wbarsym $}}}}{V}}\pmod{B(X)}$ .

Line 3 computes approximations $(\mathchoice{\accentset{\displaystyle\text{\smash{\raisebox{-5.8125pt}{$ \wtildesym $}}}}{U}}{\accentset{\textstyle\text{\smash{\raisebox{-5.8125pt}{$ \wtildesym $}}}}{U}}{\accentset{\scriptstyle\text{\smash{\raisebox{-4.06876pt}{$ \wtildesym $}}}}{U}}{\accentset{\scriptscriptstyle\text{\smash{\raisebox{-2.90625pt}{$ \wtildesym $}}}}{U}},\theta_{U})$ and $(\mathchoice{\accentset{\displaystyle\text{\smash{\raisebox{-5.8125pt}{$ \wtildesym $}}}}{V}}{\accentset{\textstyle\text{\smash{\raisebox{-5.8125pt}{$ \wtildesym $}}}}{V}}{\accentset{\scriptstyle\text{\smash{\raisebox{-4.06876pt}{$ \wtildesym $}}}}{V}}{\accentset{\scriptscriptstyle\text{\smash{\raisebox{-2.90625pt}{$ \wtildesym $}}}}{V}},\theta_{V})$ to $\gamma^{\dagger}\mathchoice{\accentset{\displaystyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{U}}{\accentset{\textstyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{U}}{\accentset{\scriptstyle\text{\smash{\raisebox{-3.76735pt}{$ \wbarsym $}}}}{U}}{\accentset{\scriptscriptstyle\text{\smash{\raisebox{-2.69096pt}{$ \wbarsym $}}}}{U}}$ and $\gamma^{\dagger}\mathchoice{\accentset{\displaystyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{V}}{\accentset{\textstyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{V}}{\accentset{\scriptstyle\text{\smash{\raisebox{-3.76735pt}{$ \wbarsym $}}}}{V}}{\accentset{\scriptscriptstyle\text{\smash{\raisebox{-2.69096pt}{$ \wbarsym $}}}}{V}}$ . Line 4 computes $\mathchoice{\accentset{\displaystyle\text{\smash{\raisebox{-5.8125pt}{$ \wtildesym $}}}}{W}}{\accentset{\textstyle\text{\smash{\raisebox{-5.8125pt}{$ \wtildesym $}}}}{W}}{\accentset{\scriptstyle\text{\smash{\raisebox{-4.06876pt}{$ \wtildesym $}}}}{W}}{\accentset{\scriptscriptstyle\text{\smash{\raisebox{-2.90625pt}{$ \wtildesym $}}}}{W}}$ , an approximation to $\mathchoice{\accentset{\displaystyle\text{\smash{\raisebox{-5.8125pt}{$ \wtildesym $}}}}{U}}{\accentset{\textstyle\text{\smash{\raisebox{-5.8125pt}{$ \wtildesym $}}}}{U}}{\accentset{\scriptstyle\text{\smash{\raisebox{-4.06876pt}{$ \wtildesym $}}}}{U}}{\accentset{\scriptscriptstyle\text{\smash{\raisebox{-2.90625pt}{$ \wtildesym $}}}}{U}}\mathchoice{\accentset{\displaystyle\text{\smash{\raisebox{-5.8125pt}{$ \wtildesym $}}}}{V}}{\accentset{\textstyle\text{\smash{\raisebox{-5.8125pt}{$ \wtildesym $}}}}{V}}{\accentset{\scriptstyle\text{\smash{\raisebox{-4.06876pt}{$ \wtildesym $}}}}{V}}{\accentset{\scriptscriptstyle\text{\smash{\raisebox{-2.90625pt}{$ \wtildesym $}}}}{V}}$ , and $\theta_{W}$ , an approximation to $\theta_{U}\theta_{V}\in\mathbf{R}$ . The latter involves just a single real multiplication. Let us write $\mathchoice{\accentset{\displaystyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{U}}{\accentset{\textstyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{U}}{\accentset{\scriptstyle\text{\smash{\raisebox{-3.76735pt}{$ \wbarsym $}}}}{U}}{\accentset{\scriptscriptstyle\text{\smash{\raisebox{-2.69096pt}{$ \wbarsym $}}}}{U}}^{\prime}$ and $\mathchoice{\accentset{\displaystyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{V}}{\accentset{\textstyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{V}}{\accentset{\scriptstyle\text{\smash{\raisebox{-3.76735pt}{$ \wbarsym $}}}}{V}}{\accentset{\scriptscriptstyle\text{\smash{\raisebox{-2.69096pt}{$ \wbarsym $}}}}{V}}^{\prime}$ for the images of $\mathchoice{\accentset{\displaystyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{U}}{\accentset{\textstyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{U}}{\accentset{\scriptstyle\text{\smash{\raisebox{-3.76735pt}{$ \wbarsym $}}}}{U}}{\accentset{\scriptscriptstyle\text{\smash{\raisebox{-2.69096pt}{$ \wbarsym $}}}}{U}}$ and $\mathchoice{\accentset{\displaystyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{V}}{\accentset{\textstyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{V}}{\accentset{\scriptstyle\text{\smash{\raisebox{-3.76735pt}{$ \wbarsym $}}}}{V}}{\accentset{\scriptscriptstyle\text{\smash{\raisebox{-2.69096pt}{$ \wbarsym $}}}}{V}}$ in $\mathbf{R}[X]/C(X)$ . A similar calculation to that used in the proof of Theorem 2.2 shows that

[TABLE]

and that

[TABLE]

These two inequalities may be expressed more briefly in combination by writing

[TABLE]

Line 5 computes an approximation $\mathchoice{\accentset{\displaystyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{W}}{\accentset{\textstyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{W}}{\accentset{\scriptstyle\text{\smash{\raisebox{-3.76735pt}{$ \wbarsym $}}}}{W}}{\accentset{\scriptscriptstyle\text{\smash{\raisebox{-2.69096pt}{$ \wbarsym $}}}}{W}}$ to $\delta^{\dagger}(\mathchoice{\accentset{\displaystyle\text{\smash{\raisebox{-5.8125pt}{$ \wtildesym $}}}}{W}}{\accentset{\textstyle\text{\smash{\raisebox{-5.8125pt}{$ \wtildesym $}}}}{W}}{\accentset{\scriptstyle\text{\smash{\raisebox{-4.06876pt}{$ \wtildesym $}}}}{W}}{\accentset{\scriptscriptstyle\text{\smash{\raisebox{-2.90625pt}{$ \wtildesym $}}}}{W}},\theta_{W})$ . Using Lemma 4.20 and Lemma 4.22 we find that

[TABLE]

As noted earlier, the coefficients of $H(X)$ are exactly those of $(1-2^{-b}X)\mathchoice{\accentset{\displaystyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{U}}{\accentset{\textstyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{U}}{\accentset{\scriptstyle\text{\smash{\raisebox{-3.76735pt}{$ \wbarsym $}}}}{U}}{\accentset{\scriptscriptstyle\text{\smash{\raisebox{-2.69096pt}{$ \wbarsym $}}}}{U}}\mathchoice{\accentset{\displaystyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{V}}{\accentset{\textstyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{V}}{\accentset{\scriptstyle\text{\smash{\raisebox{-3.76735pt}{$ \wbarsym $}}}}{V}}{\accentset{\scriptscriptstyle\text{\smash{\raisebox{-2.69096pt}{$ \wbarsym $}}}}{V}}$ . We know from Proposition 4.2 that $2^{b}H(X)$ has integer coefficients, so we deduce that $\operatorname{round}(2^{b}\mathchoice{\accentset{\displaystyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{W}}{\accentset{\textstyle\text{\smash{\raisebox{-5.38193pt}{$ \wbarsym $}}}}{W}}{\accentset{\scriptstyle\text{\smash{\raisebox{-3.76735pt}{$ \wbarsym $}}}}{W}}{\accentset{\scriptscriptstyle\text{\smash{\raisebox{-2.69096pt}{$ \wbarsym $}}}}{W}}_{i})=2^{b}H_{i}$ for each $i=0,\ldots,N$ . Thus the value $t$ computed in line 6 satisfies

[TABLE]

Applying Proposition 4.2, we obtain

[TABLE]

We have $w_{i}\leqslant 2^{2b}N$ for $i=0,\ldots,N-1$ , since each $w_{i}$ is a sum of $i+1$ terms of the form $u_{j}v_{k}$ , so

[TABLE]

The hypothesis $(N+1)b\geqslant n+\lg N+2$ then yields

[TABLE]

Let $w\coloneqq\operatorname{round}(t)$ be the value returned in line 7. Since $0\leqslant uv<2^{2n}$ , the inequality (4.8) implies that $-\frac{1}{2}<t<2^{n}$ , and hence that $0\leqslant w\leqslant 2^{n}$ . Moreover, since $|t-w|\leqslant\frac{1}{2}$ , we conclude that

[TABLE]

as desired. The running time analysis is essentially the same as in the proof of Theorem 2.2. ∎

5. Implementation and performance

We wrote an implementation of the new truncated product algorithms in the C programming language, together with a comparable implementation of the full product, to examine to what extent the predicted 25% reduction in complexity can be realised in practice. The source code is available from the author’s web page under a free software license.

The timings reported in Table 1 were run on a single core of an otherwise idle 2.5GHz Intel Xeon Gold 6248 (Cascade Lake microarchitecture), running Rocky Linux 8.8 (kernel version 4.18.0). We compiled our program using GCC 12.2.0 with the optimisation flags -O3 -mavx2 -mavx512f -ffast-math. In the critical inner loops, our code uses GCC’s vector extensions to take advantage of the AVX2 instruction set available on the target platform.

For the real convolutions, our code relies on the one-dimensional real-to-complex and complex-to-real transforms provided by the FFTW library (version 3.3.10) [5]. We configured FFTW using the --enable-avx2 --enable-avx512 flags, and used FFTW’s “wisdom” facility with the FFTW_MEASURE option to find efficient transform sequences for all relevant transform lengths.

Our implementation differs from the theoretical presentation in Section 3 and Section 4 in several respects:

•

Instead of fixed point arithmetic, we use double-precision floating point (the double data type in C). In particular, this applies to the routines that compute $\alpha^{*}$ , $\beta^{*}$ , $\gamma^{\dagger}$ and $\delta^{\dagger}$ , and also the FFTs and pointwise multiplications. (The splitting and overlap-add steps are handled using integer arithmetic.) We make no attempt to prove any bounds for round-off error. This is impossible anyway in the context of our program, as FFTW does not offer any error guarantees.

•

In the splitting step we allow signed coefficients. For example, we write $u=U(2^{b})$ where the coefficients of $U$ are integers lying in the balanced interval $|U_{i}|\leqslant 2^{b-1}$ . This leads to less coefficient growth in the product $U(X)V(X)$ : instead of these coefficients having roughly $2b+\lg N$ bits, for uniformly random inputs they tend to have around $2b+\frac{1}{2}\lg N$ bits, due to cancellation between the positive and negative terms. Of course, an adversary could easily choose inputs for which every $U_{i}$ and $V_{i}$ is close to $2^{b-1}$ , in which case the product coefficients will have close to $2b+\lg N$ bits. In this case our program will certainly produce incorrect output, unless we decrease $b$ to compensate.

Table 1 shows timings for our low product, high product, and full product routines for various choices of $n$ (with parameters as indicated in Table 2), as well as timings for the full product computed by the mpz_mul function from the GMP multiple-precision arithmetic library (version 6.2.1) [7]. The reported timings are averages taken over numerous tests; for each entry in the table, the measurements are quite stable, with standard deviation around 1–2% of the average time.

Comparing the performance of our code against GMP is not quite fair, because in principle GMP performs a provably correct computation, whereas the output of our program is not provably correct (as explained above). Nevertheless, the timings demonstrate that our code is competitive with the highly optimised multiplication routines in GMP.

For each $n$ shown in Table 1, we chose the parameters as follows. We ran a large number of tests to determine the maximum possible $b$ for which the program consistently produces the correct output for uniformly random inputs $u$ and $v$ . The parameter $\lambda$ refers to the number of terms used in the approximation of the ring isomorphisms such as $\alpha^{*}$ ; it has the same meaning as in the proof of Proposition 3.17. Again, we chose $\lambda$ by empirical testing, taking the smallest value that led to consistently correct output. Regarding the choice of $N$ , we examined several possible candidates, namely those of the form $N=2^{e_{2}}3^{e_{3}}5^{e_{5}}7^{e_{7}}$ where $n/b\leqslant N\leqslant 1.15n/b$ and $3^{e_{3}}5^{e_{5}}7^{e_{7}}<200$ , and chose the candidate that led to the fastest timings. (In principle the choice of $N$ could also affect correctness, due to different FFT algorithms being used for different $N$ , but in practice we found it did not make a difference.) The resulting values of $N$ , $b$ and $\lambda$ are shown in Table 2; these are the values that were used to produce the timings in Table 1.

The final column in Table 2 gives the ratio between the transform length used for the truncated products (the column labelled $N$ ) and the corresponding transform length for the full product (the column labelled $2N$ ). As discussed in §2.3 (scenario #2), we expect this ratio to be close to $3/4$ . The observed values are reasonably close to $3/4$ , but there is some variation due to the sparsity of available transform lengths. We also predicted in §2.3 that the ratio of the corresponding values of $b$ should be about $2/3$ ; this is borne out clearly in Table 2.

Finally, let us discuss the main quantity of interest, namely, the ratio of the running times between the truncated products and the full product. These ratios are shown in parentheses in Table 1. In an ideal world these ratios would be close to $0.75$ . Unfortunately, the values shown in the table fall somewhat short of this goal. As $n$ increases, the performance appears to pass through three distinct phases:

•

In the first few rows of the table, the ratio is close to $1$ . For these small values of $n$ , the savings from the shorter transform lengths in the truncated products are outweighed by the additional cost of evaluating the ring isomorphisms.

•

There is a wide range of intermediate values of $n$ where the ratios for the low product lie roughly between 0.85 and 0.9, i.e., we see a speedup of 10–15% compared to the full product. The high product lags behind by about 2–3%.

•

In the last two rows of the table, there is a serious degradation in performance. This appears to be mainly due to problems with FFTW’s handling of large transforms of composite length.

For example, for the low product in the last line, the FFTs of size $2^{27}\cdot 7$ account for roughly 87s of the total 109s, whereas for the corresponding full product, the FFTs of length $2^{27}\cdot 9$ account for about 63s out of the total 80s. One would normally expect the ratio of these FFT times to be about $7/9\approx 0.78$ rather than $87/63\approx 1.38$ . We did not explore the underlying reasons for this discrepancy, but we speculate that it is caused by suboptimal locality in the algorithms that FFTW uses to decompose a large composite-length FFT into smaller transforms.

In summary, our implementation shows that it is possible to achieve a nontrivial speedup for the computation of truncated products, over a wide range of values of $n$ . The author is hopeful that the running times may be improved further by more careful optimisation work, especially in the subroutines for computing the ring isomorphisms. It also seems likely that the performance for large values of $n$ may be improved by suitable tweaking of the underlying FFT algorithms.

Acknowledgments

The author thanks Joris van der Hoeven for his comments on a draft of this paper. The performance tests were carried out on the Katana computing cluster at UNSW [16]; many thanks to Martin Thompson for technical support. The author was supported by the Australian Research Council, grants DP150101689 and FT160100219.

Bibliography18

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] P. Barrett, Implementing the Rivest Shamir and Adleman public key encryption algorithm on a standard digital signal processor , Advances in cryptology—CRYPTO ’86 (Santa Barbara, Calif., 1986), Lecture Notes in Comput. Sci., vol. 263, Springer, Berlin, 1987, pp. 311–323. MR 907099 (88i:94015)
2[2] D. Bernstein, Removing redundancy in high-precision Newton iteration , unpublished, available at http://cr.yp.to/papers.html#fastnewton , 2004.
3[3] R. P. Brent and P. Zimmermann, Modern computer arithmetic , Cambridge Monographs on Applied and Computational Mathematics, vol. 18, Cambridge University Press, Cambridge, 2011. MR 2760886
4[4] P. Bürgisser, M. Clausen, and M. A. Shokrollahi, Algebraic complexity theory , Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 315, Springer-Verlag, Berlin, 1997, With the collaboration of Thomas Lickteig. MR 1440179 (99c:68002)
5[5] M. Frigo, A Fast Fourier Transform Compiler , Proceedings of the ACM SIGPLAN 1999 Conference on Programming Language Design and Implementation (New York, NY, USA), PLDI ’99, ACM, 1999, pp. 169–180.
6[6] I. M. Gessel, Lagrange inversion , J. Combin. Theory Ser. A 144 (2016), 212–249. MR 3534068
7[7] T. Granlund, The GNU Multiple Precision Arithmetic Library (Version 6.2.1) , http://gmplib.org/ .
8[8] D. Harvey and J. van der Hoeven, Faster integer and polynomial multiplication using cyclotomic coefficient rings , https://arxiv.org/abs/1712.03693 , 2017.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Faster truncated integer multiplication

Abstract.

1. Introduction

2. Setup and statement of results

2.1. Fixed point arithmetic and real convolutions

2.2. The full product

Theorem 2.1** (Full product).**

Proof.

2.3. Statement of results

Theorem 2.2** (Low product).**

Theorem 2.3** (High product).**

3. The low product

3.1. The cancellation trick

Proposition 3.1**.**

Proof.

3.2. The roots of A(X)A(X)A(X)

Lemma 3.2**.**

Proof.

Lemma 3.3**.**

Proof.

Lemma 3.4**.**

Proof.

Remark 3.5*.*

Lemma 3.6**.**

Proof.

Corollary 3.7**.**

Proof.

Corollary 3.8**.**

Proof.

3.3. Ring isomorphisms

Lemma 3.9**.**

Proof.

Lemma 3.10**.**

Proof.

Lemma 3.11**.**

Proof.

Lemma 3.12**.**

Proof.

Lemma 3.13**.**

Remark 3.14*.*

Proof.

Lemma 3.15**.**

Proof.

Corollary 3.16**.**

Proof.

Proposition 3.17** (Approximating α∗\alpha^{*}α∗).**

Proof.

Proposition 3.18** (Approximating β∗\beta^{*}β∗).**

Proof.

Remark 3.19*.*

3.4. The main algorithm

Proof of Theorem 2.2.

4. The high product

Remark 4.1*.*

4.1. The cancellation trick

Proposition 4.2**.**

Proof.

4.2. The roots of B(X)B(X)B(X)

Lemma 4.3**.**

Remark 4.4*.*

Proof.

Lemma 4.5**.**

Proof.

Lemma 4.6**.**

Proof.

Lemma 4.7**.**

Proof.

Lemma 4.8**.**

Proof.

Lemma 4.9**.**

Proof.

Remark 4.10*.*

Corollary 4.11**.**

Theorem 2.1 (Full product).

Theorem 2.2 (Low product).

Theorem 2.3 (High product).

Proposition 3.1.

3.2. The roots of $A(X)$

Lemma 3.2.

Lemma 3.3.

Lemma 3.4.

*Remark 3.5**.*

Lemma 3.6.

Corollary 3.7.

Corollary 3.8.

Lemma 3.9.

Lemma 3.10.

Lemma 3.11.

Lemma 3.12.

Lemma 3.13.

*Remark 3.14**.*

Lemma 3.15.

Corollary 3.16.

Proposition 3.17 (Approximating $\alpha^{*}$ ).

Proposition 3.18 (Approximating $\beta^{*}$ ).

*Remark 3.19**.*

*Remark 4.1**.*

Proposition 4.2.

4.2. The roots of $B(X)$

Lemma 4.3.

*Remark 4.4**.*

Lemma 4.5.

Lemma 4.6.

Lemma 4.7.

Lemma 4.8.

Lemma 4.9.

*Remark 4.10**.*

Corollary 4.11.

Corollary 4.12.

Lemma 4.13.

Lemma 4.14.

Lemma 4.15.

Lemma 4.16.

Lemma 4.17.

Lemma 4.18.

Corollary 4.19.

Lemma 4.20.

Lemma 4.21.

Lemma 4.22.

Proposition 4.23 (Approximating $\gamma^{\dagger}$ ).

Proposition 4.24 (Approximating $\delta^{\dagger}$ ).

*Remark 4.25**.*