Numerically Stable Polynomially Coded Computing

Mohammad Fahim; Viveck R. Cadambe

arXiv:1903.08326·cs.IT·May 23, 2019

Numerically Stable Polynomially Coded Computing

Mohammad Fahim, Viveck R. Cadambe

PDF

1 Repo

TL;DR

This paper introduces orthogonal polynomial-based codes for matrix multiplication that enhance numerical stability and fault tolerance, with theoretical bounds and empirical validation showing reduced errors compared to traditional Vandermonde-based methods.

Contribution

It develops new orthogonal polynomial codes, especially using Chebyshev polynomials, with proven bounds on condition numbers, leading to more numerically stable coded computing techniques.

Findings

01

Orthogonal polynomial codes achieve similar fault tolerance as previous codes.

02

Chebyshev-Vandermonde matrices have polynomially bounded condition numbers.

03

Empirical results show significantly lower numerical errors with the new methods.

Abstract

We study the numerical stability of polynomial based encoding methods, which has emerged to be a powerful class of techniques for providing straggler and fault tolerance in the area of coded computing. Our contributions are as follows: 1) We construct new codes for matrix multiplication that achieve the same fault/straggler tolerance as the previously constructed MatDot Codes and Polynomial Codes. Unlike previous codes that use polynomials expanded in a monomial basis, our codes uses a basis of orthogonal polynomials. 2) We show that the condition number of every $m \times m$ sub-matrix of an $m \times n, n \geq m$ Chebyshev-Vandermonde matrix, evaluated on the $n$ -point Chebyshev grid, grows as $O (n^{2 (n - m)})$ for $n > m$ . An implication of this result is that, when Chebyshev-Vandermonde matrices are used for coded computing, for a fixed number of redundant nodes $s = n - m,$ the condition…

Tables1

Table 1. TABLE I: A table depicting the relative errors of various schemes for Δ = P − ( 2 m − 1 ) = 3 Δ 𝑃 2 𝑚 1 3 \Delta=P-(2m-1)=3 redundant nodes. The error is measured via the Frobenius norm, i.e., ‖ 𝐀𝐁 − 𝐂 ^ ‖ F ‖ 𝐀𝐁 ‖ F subscript norm 𝐀𝐁 ^ 𝐂 𝐹 subscript norm 𝐀𝐁 𝐹 \frac{||\mathbf{A}\mathbf{B}-\hat{\mathbf{C}}||_{F}}{||\mathbf{A}\mathbf{B}||_{F}} . The matrices 𝐀 , 𝐁 𝐀 𝐁 \mathbf{A},\mathbf{B} are chosen with entries 𝒩 ( 0 , 1 ) . 𝒩 0 1 \mathcal{N}(0,1). The average relative error averages over all possible 3 3 3 node failures, i.e., over every set of 2 m − 1 2 𝑚 1 2m-1 nodes among the P = 2 m + 2 𝑃 2 𝑚 2 P=2m+2 nodes; the worst case relative error involves the worst set of 2 m − 1 2 𝑚 1 2m-1 nodes. See Section V-C for more details.

Number	MatDot	OrthoMatDot	MatDot	OrthoMatDot
of Workers	worst case	worst case	average	average
$(P)$	relative error	relative error	relative error	relative error
30	$1.54 \times 10^{- 6}$	$5.14 \times 10^{- 11}$	$1.36 \times 10^{- 7}$	$1.36 \times 10^{- 13}$
50	$8.6 \times 10^{3}$	$1.27 \times 10^{- 9}$	$2.00 \times 10^{2}$	$2.04 \times 10^{- 13}$
80	$2.45 \times 10^{6}$	$1.98 \times 10^{- 8}$	$2.19 \times 10^{2}$	$3.08 \times 10^{- 12}$
150	$3.87 \times 10^{7}$	$7.84 \times 10^{- 7}$	$8.73 \times 10^{2}$	$2.03 \times 10^{- 11}$

Equations204

A = [A_{1} A_{2}], B = [B_{1} B_{2}],

A = [A_{1} A_{2}], B = [B_{1} B_{2}],

\int_{-1}^{1}q_{i}(x)q_{j}(x)dx=\left\{\begin{array}[]{cc}0&\textrm{if }i=j\\ 1&\textrm{otherwise}\end{array}\right.

\int_{-1}^{1}q_{i}(x)q_{j}(x)dx=\left\{\begin{array}[]{cc}0&\textrm{if }i=j\\ 1&\textrm{otherwise}\end{array}\right.

AB = \int_{- 1}^{1} p_{A} (x) p_{B} (x) d x .

AB = \int_{- 1}^{1} p_{A} (x) p_{B} (x) d x .

κ (x) = δ x sup (\frac{∣∣ δ f ∣∣}{∣∣ f ( x ) ∣∣} / \frac{∣∣ δ x ∣∣}{∣∣ x ∣∣}) .

κ (x) = δ x sup (\frac{∣∣ δ f ∣∣}{∣∣ f ( x ) ∣∣} / \frac{∣∣ δ x ∣∣}{∣∣ x ∣∣}) .

⟨ f, g ⟩ = \int_{a}^{b} f (x) g (x) w (x) d x

⟨ f, g ⟩ = \int_{a}^{b} f (x) g (x) w (x) d x

\displaystyle\langle q_{i},q_{j}\rangle=\left\{\begin{array}[]{ll}c_{i}&\text{if $i=j$,}\\ 0&\text{otherwise,}\end{array}\right.

\displaystyle\langle q_{i},q_{j}\rangle=\left\{\begin{array}[]{ll}c_{i}&\text{if $i=j$,}\\ 0&\text{otherwise,}\end{array}\right.

\displaystyle\langle q_{i},q_{j}\rangle=\left\{\begin{array}[]{ll}1&\text{if $i=j$,}\\ 0&\text{otherwise,}\end{array}\right.

\displaystyle\langle q_{i},q_{j}\rangle=\left\{\begin{array}[]{ll}1&\text{if $i=j$,}\\ 0&\text{otherwise,}\end{array}\right.

T_{n} (x) = 2 x T_{n - 1} (x) - T_{n - 2} (x),

T_{n} (x) = 2 x T_{n - 1} (x) - T_{n - 2} (x),

\int_{a}^{b} f (x) w (x) d x = i = 1 \sum n a_{i} f (x_{i}),

\int_{a}^{b} f (x) w (x) d x = i = 1 \sum n a_{i} f (x_{i}),

\displaystyle a_{i}=\int_{a}^{b}\bigg{(}\prod_{j\in[n]-i}\frac{x-\eta_{j}}{\eta_{i}-\eta_{j}}\bigg{)}w(x)dx,~{}i\in[n].

\displaystyle a_{i}=\int_{a}^{b}\bigg{(}\prod_{j\in[n]-i}\frac{x-\eta_{j}}{\eta_{i}-\eta_{j}}\bigg{)}w(x)dx,~{}i\in[n].

ρ_{i}^{(n)} = cos (\frac{2 i - 1}{2 n} π), i \in [n] .

ρ_{i}^{(n)} = cos (\frac{2 i - 1}{2 n} π), i \in [n] .

T_{n} (x) = 2^{n - 1} i = 1 \prod n (x - ρ_{i}^{(n)}),

T_{n} (x) = 2^{n - 1} i = 1 \prod n (x - ρ_{i}^{(n)}),

\displaystyle\mathbf{Q}^{(k,n)}(\mathbf{x})=\left(\begin{array}[]{ccc}q_{0}(x_{1})&\cdots&q_{0}(x_{n})\\ \vdots&\ddots&\vdots\\ q_{k-1}(x_{1})&\cdots&q_{k-1}(x_{n})\end{array}\right).

\displaystyle\mathbf{Q}^{(k,n)}(\mathbf{x})=\left(\begin{array}[]{ccc}q_{0}(x_{1})&\cdots&q_{0}(x_{n})\\ \vdots&\ddots&\vdots\\ q_{k-1}(x_{1})&\cdots&q_{k-1}(x_{n})\end{array}\right).

\displaystyle\mathbf{Q}_{\mathcal{S}}^{(k,n)}(\mathbf{x})=\left(\begin{array}[]{ccc}q_{0}(x_{s_{1}})&\cdots&q_{0}(x_{s_{r}})\\ \vdots&\ddots&\vdots\\ q_{k-1}(x_{s_{1}})&\cdots&q_{k-1}(x_{s_{r}})\end{array}\right).

\displaystyle\mathbf{Q}_{\mathcal{S}}^{(k,n)}(\mathbf{x})=\left(\begin{array}[]{ccc}q_{0}(x_{s_{1}})&\cdots&q_{0}(x_{s_{r}})\\ \vdots&\ddots&\vdots\\ q_{k-1}(x_{s_{1}})&\cdots&q_{k-1}(x_{s_{r}})\end{array}\right).

\displaystyle\mathbf{G}^{(k,n)}(\mathbf{x})=\left(\begin{array}[]{ccc}T_{0}(x_{1})&\cdots&T_{0}(x_{n})\\ \vdots&\ddots&\vdots\\ T_{k-1}(x_{1})&\cdots&T_{k-1}(x_{n})\end{array}\right),

\displaystyle\mathbf{G}^{(k,n)}(\mathbf{x})=\left(\begin{array}[]{ccc}T_{0}(x_{1})&\cdots&T_{0}(x_{n})\\ \vdots&\ddots&\vdots\\ T_{k-1}(x_{1})&\cdots&T_{k-1}(x_{n})\end{array}\right),

\displaystyle\mathbf{G}_{\mathcal{S}}^{(k,n)}(\mathbf{x})=\left(\begin{array}[]{ccc}T_{0}(x_{s_{1}})&\cdots&T_{0}(x_{s_{r}})\\ \vdots&\ddots&\vdots\\ T_{k-1}(x_{s_{1}})&\cdots&T_{k-1}(x_{s_{r}})\end{array}\right).

\displaystyle\mathbf{G}_{\mathcal{S}}^{(k,n)}(\mathbf{x})=\left(\begin{array}[]{ccc}T_{0}(x_{s_{1}})&\cdots&T_{0}(x_{s_{r}})\\ \vdots&\ddots&\vdots\\ T_{k-1}(x_{s_{1}})&\cdots&T_{k-1}(x_{s_{r}})\end{array}\right).

\displaystyle\tilde{\mathbf{G}}^{(k,n)}(\mathbf{x})=\left(\begin{array}[]{ccc}T_{0}(x_{1})/\sqrt{2}&\cdots&T_{0}(x_{n})/\sqrt{2}\\ T_{1}(x_{1})&\cdots&T_{1}(x_{n})\\ \vdots&\ddots&\vdots\\ T_{k-1}(x_{1})&\cdots&T_{k-1}(x_{n})\end{array}\right),

\displaystyle\tilde{\mathbf{G}}^{(k,n)}(\mathbf{x})=\left(\begin{array}[]{ccc}T_{0}(x_{1})/\sqrt{2}&\cdots&T_{0}(x_{n})/\sqrt{2}\\ T_{1}(x_{1})&\cdots&T_{1}(x_{n})\\ \vdots&\ddots&\vdots\\ T_{k-1}(x_{1})&\cdots&T_{k-1}(x_{n})\end{array}\right),

\displaystyle\tilde{\mathbf{G}}_{\mathcal{S}}^{(k,n)}(\mathbf{x})=\left(\begin{array}[]{ccc}T_{0}(x_{s_{1}})/\sqrt{2}&\cdots&T_{0}(x_{s_{r}})/\sqrt{2}\\ T_{1}(x_{s_{1}})&\cdots&T_{1}(x_{s_{r}})\\ \vdots&\ddots&\vdots\\ T_{k-1}(x_{s_{1}})&\cdots&T_{k-1}(x_{s_{r}})\end{array}\right).

\displaystyle\tilde{\mathbf{G}}_{\mathcal{S}}^{(k,n)}(\mathbf{x})=\left(\begin{array}[]{ccc}T_{0}(x_{s_{1}})/\sqrt{2}&\cdots&T_{0}(x_{s_{r}})/\sqrt{2}\\ T_{1}(x_{s_{1}})&\cdots&T_{1}(x_{s_{r}})\\ \vdots&\ddots&\vdots\\ T_{k-1}(x_{s_{1}})&\cdots&T_{k-1}(x_{s_{r}})\end{array}\right).

\mathbf{A}=\left(\mathbf{A}_{0}\ \mathbf{A}_{1}\ \ldots\ \mathbf{A}_{m-1}\right),\;\;\;\mathbf{B}=\left(\begin{array}[]{c}\mathbf{B}_{0}\\ \mathbf{B}_{1}\\ \vdots\\ \mathbf{B}_{m-1}\end{array}\right),

\mathbf{A}=\left(\mathbf{A}_{0}\ \mathbf{A}_{1}\ \ldots\ \mathbf{A}_{m-1}\right),\;\;\;\mathbf{B}=\left(\begin{array}[]{c}\mathbf{B}_{0}\\ \mathbf{B}_{1}\\ \vdots\\ \mathbf{B}_{m-1}\end{array}\right),

κ_{F}^{ma x} (G^{(n - s, n)} (ρ^{(n)})) = O ((n - s) n s (n - s) (2 n^{2})^{s - 1}),

κ_{F}^{ma x} (G^{(n - s, n)} (ρ^{(n)})) = O ((n - s) n s (n - s) (2 n^{2})^{s - 1}),

\displaystyle\mathbf{M}=\left(\begin{array}[]{cccc}1&1&\cdots&1\\ x_{1}&x_{2}&\cdots&x_{P}\\ \vdots&\vdots&\ddots&\vdots\\ x_{1}^{2m-2}&x_{2}^{2m-2}&\cdots&x_{P}^{2m-2}\end{array}\right).

\displaystyle\mathbf{M}=\left(\begin{array}[]{cccc}1&1&\cdots&1\\ x_{1}&x_{2}&\cdots&x_{P}\\ \vdots&\vdots&\ddots&\vdots\\ x_{1}^{2m-2}&x_{2}^{2m-2}&\cdots&x_{P}^{2m-2}\end{array}\right).

E_{r} (AB, \hat{C}) = \frac{∣∣ AB - C ^ ∣ ∣ _{F}}{∣∣ AB ∣ ∣ _{F}} .

E_{r} (AB, \hat{C}) = \frac{∣∣ AB - C ^ ∣ ∣ _{F}}{∣∣ AB ∣ ∣ _{F}} .

A

A

p_{A} (x) = A_{0} T_{0} (x) + A_{1} T_{1} (x) + A_{2} T_{2} (x),

p_{A} (x) = A_{0} T_{0} (x) + A_{1} T_{1} (x) + A_{2} T_{2} (x),

p_{B} (x) = B_{0} T_{0} (x) + B_{1} T_{3} (x) + B_{2} T_{6} (x) .

p_{A} (x) p_{B} (x)

p_{A} (x) p_{B} (x)

\displaystyle=\mathbf{A}_{0}\mathbf{B}_{0}+\big{(}\mathbf{A}_{1}\mathbf{B}_{0}+\frac{1}{2}\mathbf{A}_{2}\mathbf{B}_{1}\big{)}T_{1}(x)+\big{(}\mathbf{A}_{2}\mathbf{B}_{0}+\frac{1}{2}\mathbf{A}_{1}\mathbf{B}_{1}\big{)}T_{2}(x)

\displaystyle+\mathbf{A}_{0}\mathbf{B}_{1}T_{3}(x)+\frac{1}{2}\big{(}\mathbf{A}_{1}\mathbf{B}_{1}+\mathbf{A}_{2}\mathbf{B}_{2}\big{)}T_{4}(x)+\frac{1}{2}\big{(}\mathbf{A}_{1}\mathbf{B}_{2}+\mathbf{A}_{2}\mathbf{B}_{1}\big{)}T_{5}(x)

+ A_{0} B_{2} T_{6} (x) + \frac{1}{2} A_{1} B_{2} T_{7} (x) + \frac{1}{2} A_{2} B_{2} T_{8} (x)

\displaystyle\mathbf{A}\mathbf{B}=\left(\begin{array}[]{ccc}\mathbf{A}_{0}\mathbf{B}_{0}&\mathbf{A}_{0}\mathbf{B}_{1}&\mathbf{A}_{0}\mathbf{B}_{2}\\ \mathbf{A}_{1}\mathbf{B}_{0}&\mathbf{A}_{1}\mathbf{B}_{1}&\mathbf{A}_{1}\mathbf{B}_{2}\\ \mathbf{A}_{2}\mathbf{B}_{0}&\mathbf{A}_{2}\mathbf{B}_{1}&\mathbf{A}_{2}\mathbf{B}_{2}\end{array}\right).

\displaystyle\mathbf{A}\mathbf{B}=\left(\begin{array}[]{ccc}\mathbf{A}_{0}\mathbf{B}_{0}&\mathbf{A}_{0}\mathbf{B}_{1}&\mathbf{A}_{0}\mathbf{B}_{2}\\ \mathbf{A}_{1}\mathbf{B}_{0}&\mathbf{A}_{1}\mathbf{B}_{1}&\mathbf{A}_{1}\mathbf{B}_{2}\\ \mathbf{A}_{2}\mathbf{B}_{0}&\mathbf{A}_{2}\mathbf{B}_{1}&\mathbf{A}_{2}\mathbf{B}_{2}\end{array}\right).

\displaystyle\left(\begin{array}[]{c}\mathbf{C}_{T_{0}}^{(k,l)}\\ \mathbf{C}_{T_{1}}^{(k,l)}\\ \mathbf{C}_{T_{2}}^{(k,l)}\\ \mathbf{C}_{T_{3}}^{(k,l)}\\ \mathbf{C}_{T_{4}}^{(k,l)}\\ \mathbf{C}_{T_{5}}^{(k,l)}\\ \mathbf{C}_{T_{6}}^{(k,l)}\\ \mathbf{C}_{T_{7}}^{(k,l)}\\ \mathbf{C}_{T_{8}}^{(k,l)}\end{array}\right)=\left(\begin{array}[]{ccccccccc}1&0&0&0&0&0&0&0&0\\ 0&1&0&0&0&1/2&0&0&0\\ 0&0&1&0&1/2&0&0&0&0\\ 0&0&0&1&0&0&0&0&0\\ 0&0&0&0&1/2&0&0&0&1/2\\ 0&0&0&0&0&1/2&0&1/2&0\\ 0&0&0&0&0&0&1&0&0\\ 0&0&0&0&0&0&0&1/2&0\\ 0&0&0&0&0&0&0&0&1/2\end{array}\right)\left(\begin{array}[]{c}(\mathbf{A}_{0}\mathbf{B}_{0})^{(k,l)}\\ (\mathbf{A}_{1}\mathbf{B}_{0})^{(k,l)}\\ (\mathbf{A}_{2}\mathbf{B}_{0})^{(k,l)}\\ (\mathbf{A}_{0}\mathbf{B}_{1})^{(k,l)}\\ (\mathbf{A}_{1}\mathbf{B}_{1})^{(k,l)}\\ (\mathbf{A}_{2}\mathbf{B}_{1})^{(k,l)}\\ (\mathbf{A}_{0}\mathbf{B}_{2})^{(k,l)}\\ (\mathbf{A}_{1}\mathbf{B}_{2})^{(k,l)}\\ (\mathbf{A}_{2}\mathbf{B}_{2})^{(k,l)}\end{array}\right),

\displaystyle\left(\begin{array}[]{c}\mathbf{C}_{T_{0}}^{(k,l)}\\ \mathbf{C}_{T_{1}}^{(k,l)}\\ \mathbf{C}_{T_{2}}^{(k,l)}\\ \mathbf{C}_{T_{3}}^{(k,l)}\\ \mathbf{C}_{T_{4}}^{(k,l)}\\ \mathbf{C}_{T_{5}}^{(k,l)}\\ \mathbf{C}_{T_{6}}^{(k,l)}\\ \mathbf{C}_{T_{7}}^{(k,l)}\\ \mathbf{C}_{T_{8}}^{(k,l)}\end{array}\right)=\left(\begin{array}[]{ccccccccc}1&0&0&0&0&0&0&0&0\\ 0&1&0&0&0&1/2&0&0&0\\ 0&0&1&0&1/2&0&0&0&0\\ 0&0&0&1&0&0&0&0&0\\ 0&0&0&0&1/2&0&0&0&1/2\\ 0&0&0&0&0&1/2&0&1/2&0\\ 0&0&0&0&0&0&1&0&0\\ 0&0&0&0&0&0&0&1/2&0\\ 0&0&0&0&0&0&0&0&1/2\end{array}\right)\left(\begin{array}[]{c}(\mathbf{A}_{0}\mathbf{B}_{0})^{(k,l)}\\ (\mathbf{A}_{1}\mathbf{B}_{0})^{(k,l)}\\ (\mathbf{A}_{2}\mathbf{B}_{0})^{(k,l)}\\ (\mathbf{A}_{0}\mathbf{B}_{1})^{(k,l)}\\ (\mathbf{A}_{1}\mathbf{B}_{1})^{(k,l)}\\ (\mathbf{A}_{2}\mathbf{B}_{1})^{(k,l)}\\ (\mathbf{A}_{0}\mathbf{B}_{2})^{(k,l)}\\ (\mathbf{A}_{1}\mathbf{B}_{2})^{(k,l)}\\ (\mathbf{A}_{2}\mathbf{B}_{2})^{(k,l)}\end{array}\right),

\displaystyle\left(\begin{array}[]{c}(\mathbf{A}_{0}\mathbf{B}_{0})^{(k,l)}\\ (\mathbf{A}_{1}\mathbf{B}_{0})^{(k,l)}\\ (\mathbf{A}_{2}\mathbf{B}_{0})^{(k,l)}\\ (\mathbf{A}_{0}\mathbf{B}_{1})^{(k,l)}\\ (\mathbf{A}_{1}\mathbf{B}_{1})^{(k,l)}\\ (\mathbf{A}_{2}\mathbf{B}_{1})^{(k,l)}\\ (\mathbf{A}_{0}\mathbf{B}_{2})^{(k,l)}\\ (\mathbf{A}_{1}\mathbf{B}_{2})^{(k,l)}\\ (\mathbf{A}_{2}\mathbf{B}_{2})^{(k,l)}\end{array}\right)=\left(\begin{array}[]{ccccccccc}1&0&0&0&0&0&0&0&0\\ 0&1&0&0&0&1/2&0&0&0\\ 0&0&1&0&1/2&0&0&0&0\\ 0&0&0&1&0&0&0&0&0\\ 0&0&0&0&1/2&0&0&0&1/2\\ 0&0&0&0&0&1/2&0&1/2&0\\ 0&0&0&0&0&0&1&0&0\\ 0&0&0&0&0&0&0&1/2&0\\ 0&0&0&0&0&0&0&0&1/2\end{array}\right)^{-1}\left(\begin{array}[]{c}\mathbf{C}_{T_{0}}^{(k,l)}\\ \mathbf{C}_{T_{1}}^{(k,l)}\\ \mathbf{C}_{T_{2}}^{(k,l)}\\ \mathbf{C}_{T_{3}}^{(k,l)}\\ \mathbf{C}_{T_{4}}^{(k,l)}\\ \mathbf{C}_{T_{5}}^{(k,l)}\\ \mathbf{C}_{T_{6}}^{(k,l)}\\ \mathbf{C}_{T_{7}}^{(k,l)}\\ \mathbf{C}_{T_{8}}^{(k,l)}\end{array}\right),

\displaystyle\left(\begin{array}[]{c}(\mathbf{A}_{0}\mathbf{B}_{0})^{(k,l)}\\ (\mathbf{A}_{1}\mathbf{B}_{0})^{(k,l)}\\ (\mathbf{A}_{2}\mathbf{B}_{0})^{(k,l)}\\ (\mathbf{A}_{0}\mathbf{B}_{1})^{(k,l)}\\ (\mathbf{A}_{1}\mathbf{B}_{1})^{(k,l)}\\ (\mathbf{A}_{2}\mathbf{B}_{1})^{(k,l)}\\ (\mathbf{A}_{0}\mathbf{B}_{2})^{(k,l)}\\ (\mathbf{A}_{1}\mathbf{B}_{2})^{(k,l)}\\ (\mathbf{A}_{2}\mathbf{B}_{2})^{(k,l)}\end{array}\right)=\left(\begin{array}[]{ccccccccc}1&0&0&0&0&0&0&0&0\\ 0&1&0&0&0&1/2&0&0&0\\ 0&0&1&0&1/2&0&0&0&0\\ 0&0&0&1&0&0&0&0&0\\ 0&0&0&0&1/2&0&0&0&1/2\\ 0&0&0&0&0&1/2&0&1/2&0\\ 0&0&0&0&0&0&1&0&0\\ 0&0&0&0&0&0&0&1/2&0\\ 0&0&0&0&0&0&0&0&1/2\end{array}\right)^{-1}\left(\begin{array}[]{c}\mathbf{C}_{T_{0}}^{(k,l)}\\ \mathbf{C}_{T_{1}}^{(k,l)}\\ \mathbf{C}_{T_{2}}^{(k,l)}\\ \mathbf{C}_{T_{3}}^{(k,l)}\\ \mathbf{C}_{T_{4}}^{(k,l)}\\ \mathbf{C}_{T_{5}}^{(k,l)}\\ \mathbf{C}_{T_{6}}^{(k,l)}\\ \mathbf{C}_{T_{7}}^{(k,l)}\\ \mathbf{C}_{T_{8}}^{(k,l)}\end{array}\right),

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mtarekf/poly-coded-computing
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Numerically Stable Polynomially Coded Computing

Mohammad Fahim and Viveck R. Cadambe

M. Fahim and V. Cadambe are with the Department of Electrical Engineering, Pennsylvania State University, University Park, PA 16802.This work will be presented in part at the IEEE International Symposium on Information Theory (ISIT), July 2019.

Abstract

We study the numerical stability of polynomial based encoding methods, which has emerged to be a powerful class of techniques for providing straggler and fault tolerance in the area of coded computing. Our contributions are as follows:

We construct new codes for matrix multiplication that achieve the same fault/straggler tolerance as the previously constructed MatDot Codes and Polynomial Codes. Unlike previous codes that use polynomials expanded in a monomial basis, our codes use a basis of orthogonal polynomials. 2. 2.

We show that the condition number of every $m\times m$ sub-matrix of an $m\times n,n\geq m$ Chebyshev-Vandermonde matrix, evaluated on the $n$ -point Chebyshev grid, grows as $O(n^{2(n-m)})$ for $n>m$ . An implication of this result is that, when Chebyshev-Vandermonde matrices are used for coded computing, for a fixed number of redundant nodes $s=n-m,$ the condition number grows at most polynomially in the number of nodes $n$ . 3. 3.

By specializing our orthogonal polynomial based constructions to Chebyshev polynomials, and using our condition number bound for Chebyshev-Vandermonde matrices, we construct new numerically stable techniques for coded matrix multiplication. We empirically demonstrate that our constructions have significantly lower numerical errors compared to previous approaches which involve inversion of Vandermonde matrices. We generalize our constructions to explore the trade-off between computation/communication and fault-tolerance. 4. 4.

We propose a numerically stable specialization of Lagrange coded computing. Motivated by our condition number bound, our approach involves the choice of evaluation points and a suitable decoding procedure that involves inversion of an appropriate Chebyshev-Vandermonde matrix. Our approach is demonstrated empirically to have lower numerical errors as compared to standard methods.

I Introduction

The recently emerging area of “coded computing” focuses on incorporating redundancy based on coding-theory-inspired strategies to tackle central challenges in distributed computing, including stragglers, failures, processing errors, communication bottlenecks and security issues. Such ideas have been applied to different large scale distributed computations such as matrix multiplication [1, 2, 3, 4, 5], gradient methods [6, 7, 8], linear solvers [9, 10, 11] and multi-variate polynomial evaluation [12]. An important idea that has emerged from this body of the work is the use of novel, Reed-Solomon like polynomial based methods for encoding data. In polynomial based methods, each computation node stores a linearly encoded combination of the data partitions, where data stored at different worker nodes can be interpreted as evaluation of an appropriate polynomial at different points. The nodes then perform computation on these encoded versions of the data, and a central master/fusion node aggregates the outputs of these computations to recover the overall result via a decoding process that inevitably involves polynomial interpolation. Much like Reed Solomon Codes, if the number of nodes performing the computation is higher than the number of evaluation points required for accurate interpolation, the overall computation is tolerant to faults and stragglers.

Perhaps the most striking application of polynomial based methods comes in the context of matrix multiplication. To multiply two $N\times N$ matrices $\mathbf{A},\mathbf{B},$ assuming that each node stores $1/m$ of each matrix, classical work in algorithm based fault tolerance [13] outlines a coding based method which has been analyzed in [14]. Reference [2] showed through polynomial based encoding methods that the result of just $m^{2}$ nodes can be used by the master node to recover the matrix-product. Remarkably, this means that polynomial based codes ensure that the recovery threshold - the worst case number of nodes whose computation suffices to recover the overall matrix-product - does not grow with $P$ , the number of the distributed system’s worker nodes, unlike the approaches of [13, 14]. The recovery threshold for matrix multiplication has been improved to $2m-1$ via a code construction called MatDot Codes in [3], albeit at a higher communication/computation cost than codes in [2]. A second prominent application of polynomial based methods is the idea of Lagrange coded computing [12], where coding is applied for multi-variate polynomial computing with guarantees of straggler resilience, security and privacy. In addition, polynomial-based methods are also useful for communication-efficient approaches for inverse problems and gradient methods [8, 15, 10].

Despite the enormous success, the scalability of polynomial based methods in practice are limited by an “inconvenient truth”, their numerical instability. The decoding methods for polynomial based methods require interpolating a degree $K-1$ polynomial using $K$ evaluation points. While this is numerically stable for classical error correcting codes for communication and storage which are implemented over finite fields, we are concerned here for data processing applications where the operations are typically real-valued. The main reason for the instability is that either implicitly or explicitly, interpolation effectively solves a linear system whose transform is characterized by a Vandermonde matrix. It is well known that the condition number of Vandermonde matrices with real-valued nodes grows exponentially in the dimension of the matrix [16, 17, 18, 19]. The large condition number means that small perturbations of the Vandermonde matrix due to numerical precision errors can result in singular matrices [20, 21]. In practice, this can translate to large numerical errors even when the coded computation is distributed among few tens of nodes111For example, [22], reports that “In our experiments we observed large floating point errors when inverting high degree Vandermonde matrices for polynomial interpolation”.. Conventional intuition dictates that the main scalability bottlenecks in distributed computing include computation cost per worker, communication bottlenecks, and stragglers. However, for polynomially coded computing, it turns out that numerically stability is also critical and constitutes a huge bottleneck for scalability of such codes. Indeed, a polynomially coded computing scheme that achieves the minimum recovery threshold, and that is optimal computation/communication wise, will simply fail once implemented on a distributed system with tens of computing nodes due to the large numerical errors. Thus, the main contribution of our paper is a new numerically stable approach to polynomially coded computing.

II Summary of Contributions

In this paper, we develop a new, numerically stable, approach for polynomially coded computing. A significant difference from previous polynomial coding approaches is that we depart from the monomial basis, which allows us to circumvent the inherently ill-conditioned Vandermonde-matrices. We demonstrate our approach through two important applications of polynomially coded computing: matrix multiplication, and Lagrange coded computing.

To illustrate our results, consider the coded matrix multiplication problem, where the goal is to multiply two matrices $\mathbf{A},\mathbf{B}$ over $P$ computation nodes where each node stores $1/m$ of each of the two matrices. A master node encodes $\mathbf{A},\mathbf{B}$ into $P$ matrices each, and sends these matrices respectively to each worker node. Each worker node multiplies the received encoded matrices, and sends the product back to the fusion node222The master and fusion nodes are logical entities; in practice, they may be the same node, or may be emulated in a decentralized manner by the computation nodes., which aims to recover $\mathbf{A}\mathbf{B}$ from a subset of the worker nodes. The recovery threshold is defined as a number $K$ such that the computation of any set of $K$ worker nodes suffices to recover the product $\mathbf{A}\mathbf{B}.$ The MatDot scheme of [3] achieves the best known recovery threshold of $2m-1$ . We begin with an example of MatDot Codes for $m=2.$

Example 1: MatDot Codes [3], recovery threshold = 3: Consider two $N\times N$ matrices

[TABLE]

where $\mathbf{A}_{1},\mathbf{A}_{2}$ are $N\times N/2$ matrices and $\mathbf{B}_{1},\mathbf{B}_{2}$ are $N/2\times N$ matrices. * Define $p_{\mathbf{A}}(x)=\mathbf{A}_{1}+\mathbf{A}_{2}x$ and $p_{\mathbf{B}}(x)=\mathbf{B}_{1}x+\mathbf{B}_{2},$ and let $x_{1},\cdots,x_{P}$ be distinct real values. Notice that $\mathbf{AB}=\mathbf{A}_{1}\mathbf{B}_{1}+\mathbf{A}_{2}\mathbf{B}_{2}$ is the coefficient of $x$ in polynomial $p_{\mathbf{A}}(x)p_{\mathbf{B}}(x)$ . In MatDot Codes, as illustrated in Fig. 1, worker node $i$ computes $p_{\mathbf{A}}(x_{i})p_{\mathbf{B}}(x_{i}),~{}i=1,2,\ldots P,$ so that from any $3$ of the $P$ nodes, the polynomial $p(x)=\mathbf{{A}}_{1}\mathbf{{B}}_{2}+(\mathbf{{A}}_{1}\mathbf{{B}}_{1}+\mathbf{{A}}_{2}\mathbf{{B}}_{2})x+\mathbf{{A}}_{2}\mathbf{{B}}_{1}x^{2}$ can be interpolated. Having interpolated the polynomial, the product $\mathbf{A}\mathbf{B}$ is simply the coefficient of $x$ .*

A generalization of the above example leads to a recovery threshold of $2m-1$ , with a decoding process that involves effectively inverting a $2m-1\times 2m-1$ Vandermonde matrix. It has been shown that the condition number of the $n\times n$ Vandermonde matrix grows exponentially in $n$ with both $\ell_{\infty}$ and $\ell_{2}$ norms [16, 17]. The intuition behind the inherent poor conditioning of the monomial basis $\{1,x,x^{2},\ldots,x^{2m-1}\}$ is demonstrated in Fig. 4 and Fig. 4.

Motivated by Fig.4, we aim, in this paper, to choose polynomials that are orthonormal. However, it is not immediately clear whether orthonormal polynomials are applicable for matrix multiplications. We demonstrate the applicability of orthonormal codes for matrix multiplication. For the example below, let $q_{0}(x),q_{1}(x)$ denote two orthonormal polynomials such that

[TABLE]

where $q_{i}(x),i=0,1$ has degree $i$ .

Example 2 : OrthoMatDot Codes [This paper], recovery threshold = 3: For two $N\times N$ matrices $\mathbf{A}=\begin{bmatrix}\mathbf{{A}}_{1}&\mathbf{{A}}_{2}\end{bmatrix},\mathbf{B}=\begin{bmatrix}\mathbf{{B}}_{1}\\ \mathbf{{B}}_{2}\end{bmatrix},$ let $p_{\mathbf{A}}(x)=\mathbf{A}_{1}q_{0}(x)+\mathbf{A}_{2}q_{1}(x)$ and $p_{\mathbf{B}}(x)=\mathbf{B}_{1}q_{0}(x)+\mathbf{B}_{2}q_{1}(x).$ Notice that because of (1), we have

[TABLE]

This leads to the following coded computing scheme: worker node $i$ computes $p_{\mathbf{A}}(x_{i})p_{\mathbf{B}}(x_{i}),~{}i=1,2,\ldots P,$ where $x_{1},\cdots,x_{P}$ are distinct real values, so that from any $3$ of the $P$ nodes, the fusion node can interpolate $p(x)=p_{\mathbf{A}}(x)p_{\mathbf{B}}(x)$ . Having interpolated the polynomial, the fusion node obtains the product $\mathbf{A}\mathbf{B}$ by performing $\int_{-1}^{1}p_{\mathbf{A}}(x)p_{\mathbf{B}}(x)dx$ . This example is illustrated in Fig. 5.

A simple generalization of the above example, described in Construction 1 in Section IV, leads to a class of codes, we refer to it as OrthoMatDot Codes, with recovery threshold of $2m-1$ , the same recovery threshold as MatDot Codes. In general, orthonormal polynomials are defined over arbitrary weight measure $\int_{-1}^{1}{\bf\cdot}~{}w(x)dx;$ some well known classes of polynomials corresponding to different weight measures $w(x)$ include Legendre, Chebyshev, Jacobi and Laguerre Polynomials [20, 21] (See Section III for definitions). Our OrthoMatDot Codes in Section IV can use any weight measure, and therefore can be used with different classes of orthonormal polynomials. Of particular interest to our paper are the Chebyshev polynomials (Fig. 4).

With our basic template, the task of developing numerically stable codes boils down to (A) interpolating $p_{\mathbf{A}}(x)p_{\mathbf{B}}(x)$ in a numerically stable manner, and (B) integrating this polynomial in a numerically stable manner. For task (B), we use a decoding procedure via Gauss Quadrature [20, 23, 21] to recover the integral. Task (A) is particularly challenging in the coding setting, because our goal is to interpolate the coefficients of $p_{\mathbf{A}}(x)p_{\mathbf{B}}(x)$ - expanded over a series of orthonormal polynomials - from any $2m-1$ points among a set of $P$ points.

In Section V, we provide a specialization to the class of OrthoMatDot Codes, a numerically stable matrix multiplication code construction that has the same recovery threshold and communication/computation cost per worker as MatDot codes. The construction specializes the class of OrthoMatDot Codes via the use of Chebyshev polynomials, which are a class of orthogonal polynomials that are ubiquitous in numerical methods and approximation theory [21]. Construction 2 also specifies the choice of evaluation points $x_{1},x_{2},\ldots,x_{P}.$

The decoding procedure outlined for the specialization of OrthoMatDot Codes in Section V involves the effective inversion of some $2m-1\times 2m-1$ sub-matrix of a $2m-1\times P$ Chebyshev-Vandermonde matrix [19], where each of the $i$ -th column contains evaluations of the first $2m-1$ Chebyshev polynomials at $x_{i},i=1,2,\ldots,P$ . A key technical result of our paper shows that, with our choice of evaluation points $x_{1},x_{2},\ldots,x_{P},$ every $2m-1\times 2m-1$ square sub-matrix of the $2m-1\times P$ Chebyshev-Vandermonde matrix is well-conditioned. More precisely, we show that, with our choice of $x_{1},x_{2},\ldots,x_{P}$ , the condition number of any $2m-1\times 2m-1$ sub-matrix of the Chebyshev-Vandermonde matrix grows at most polynomially in $P$ when the number of redundant parity nodes $\Delta=P-(2m-1)$ is fixed. Our condition number bound may be viewed as result of independent interest in the area of numerical methods, and requires non-trivial use of techniques from numerical approximation theory. This result is in contrast with the well known exponential growth for Vandermonde systems. We also show the significant improvement in stability via numerical experiments in Section V-C. We also provide a preview of the results here in Table I, whose results demonstrate that remarkably, our Chebyhev-Vandermonde construction with even $P=150$ nodes has a smaller relative error than the Vandermonde-based MatDot Codes333We note that the numerical error depends not only on the condition number of the matrix, but also the algorithm used for solving the linear system. However, we are not aware of any approach that can accurately solve, say, a $150\times 150$ linear system with a Vandermonde matrix (See e.g., [24, 25]) with $P=30$ nodes.

While MatDot Codes [3] have an optimal recovery threshold of $2m-1$ , they have relatively higher computation cost per worker ( $O(N^{3}/m)$ ) and worker node to fusion node communication cost ( $O(N^{2})$ ) as compared to Polynomial Codes [2] which have a computation cost per worker of $O(N^{3}/m^{2})$ and worker node to fusion node communication cost of $O(N^{2}/m^{2})$ . In particular, each worker in MatDot Codes performs an “outer” product of an $N\times N/m$ matrix with a $N/m\times N$ matrix, whereas each worker in Polynomial Codes performs an “inner” product of a $N/m\times N$ matrix with a $N\times N/m$ matrix. The reduced computation/communication comes at the cost of weaker fault-tolerance - Polynomial Codes have a higher recovery threshold of $m^{2}$ as compared with MatDot Codes ( $2m-1$ ). In Section VI, we develop numerically stable codes for matrix multiplication, again via orthogonal polynomials, that achieve the same low computation/communication costs as Polynomial Codes as well as the same recovery threshold, we refer to these codes as OrthoPoly Codes.

The trade-off between computation/communication cost and recovery threshold imposed by MatDot Codes and Polynomial Codes has motivated general code constructions that interpolates both of them [3, 5, 26], albeit using the monomial basis. In Section VII, we extend our approach to a general matrix multiplication code construction, referred to as Generalized OrthoMatDot, that offers a computation/communication cost vs recovery threshold trade-off, following the research thread for the monomial basis [3, 5, 26], however we also target numerical stability in our proposed construction. While our Generalized OrthoMatDot Codes specialize to OrthoMatDot Codes, i.e., they achieve the same optimal recovery threshold as OrthoMatDot Codes when allowing for the same computation/communication cost as OrthoMatDot Codes, they do not specialize to OrthoPoly Codes. Specifically, Generalized OrthoMatDot codes have higher recovery threshold than OrthoPoly Codes when allowing for the same computation/communication cost as OrthoPoly Codes. In Section VIII, we exploit the result obtained in Theorem V.1 on the condition number of the square $K\times K$ sub-matrices of the $K\times P$ Chebyshev-Vandermonde matrices to propose a numerically stable algorithm for Lagrange coded computing. In Section IX, we conclude with a discussion on other related problems such as matrix-vector multiplication [13, 27], and describe some related open questions.

III Preliminaries on Numerical Analysis and Notations

We discuss, in this section, the problem of finite precision in representing real numbers on digital machines and how it may horribly affect the output of computation problems performed on these machines. In addition, we also introduce some basic definitions and results from the area of numerical approximation theory that will be used in this paper[23], [28]. At the end of this section, we provide most of the common notations that will be used in this paper.

III-A Preliminaries on Numerical Analysis

Since digital machines have finite memory, real numbers are digitally stored using a finite number of bits, i.e., finite precision. However, storing real numbers using a finite number of bits leads to inevitable errors since a finite number of bits can only represent a finite number of real numbers with no errors. On the other hand, real numbers that cannot be directly represented using the specified finite number of bits have to be either truncated or rounded-off in order to fit in the memory. Although such perturbation (e.g., truncation/round-off error) of real numbers due to the finite precision of digital machines can be negligibly small, the perturbation of the output of any computation that uses such “small” perturbed stored real numbers as input does not necessarily be small as well. In fact, a very small perturbation to the input of some computation may lead to an output that is totally wrong and irrelevant to the correct output. The condition number of a computation problem captures/measures this observation.

Definition III.1 (Condition Number)

Let $f$ be a function representing a computation problem with input $x$ , and let $\delta x$ be a small perturbation of $x$ , and define $\delta f=f(x+\delta x)-f(x)$ to be the perturbation of $f$ due to $\delta x$ , the condition number of the problem at $x$ with respect to some norm $||\cdot||$ is

[TABLE]

Given the above definition of condition number, a problem is said to be “ill-conditioned” if small perturbations in the input lead to large perturbation in the output (i.e., the condition number is large). On the other hand, a problem is said to be “well-conditioned” if small perturbations in the input lead to small perturbations in the output (i.e., the condition number is small).

In what follows, we discuss the condition number of two computation problems: the matrix-vector multiplication and solving a system of linear equations. For both problems, consider the system of linear equations represented in the matrix form $\mathbf{A}\mathbf{x}=\mathbf{y}$ , where $\mathbf{A}\in\mathbb{R}^{n,n}$ and non-singular, and $\mathbf{x},\mathbf{y}\in\mathbb{R}^{n}$ , and let $||\cdot||$ be some matrix norm. Then, let $\mathbf{A}$ be fixed, the condition number of this matrix-vector multiplication problem with $\mathbf{y}$ as its output given small perturbations in the input $\mathbf{x}$ is $\kappa(\mathbf{x})\leq||\mathbf{A}||||\mathbf{A}^{-1}||$ , for any $\mathbf{x}\in\mathbb{R}^{n}$ . Also, for the problem of solving the system of linear equations $\mathbf{A}\mathbf{x}=\mathbf{y}$ , with $\mathbf{A}$ still fixed, the condition number of the problem of solving this system of linear equations, given small perturbations in the input $\mathbf{y}$ , where $\mathbf{x}$ is the output, is $\kappa(\mathbf{y})\leq||\mathbf{A}||||\mathbf{A}^{-1}||$ , for any $\mathbf{y}\in\mathbb{R}^{n}$ .

Since we focus on polynomially coded computing, next, we introduce some basic tools of numerical approximation theory that will be used throughout this paper. Notice that, in the following, $C[a,b]$ denotes the vector space of continuous integrable functions defined on the interval $[a,b]$ .

Definition III.2 (Inner Products on $C[a,b]$ )

For any $f,g\in C[a,b]$ , and given a non-negative integrable weight function $w$ ,

[TABLE]

defines an inner product on $C[a,b]$ relative to $w$ .

Definition III.3 (Orthogonal Polynomials)

Consider a non-negative integrable weight function $w$ , the polynomials $\{q_{i}\}_{i\geq 0}$ in $C[a,b]$ where $q_{i}(x)$ has degree $i$ and

[TABLE]

for some non-zero values $c_{i}$ , where the inner product is relative to $w$ , are called orthogonal polynomials relative to $w$ , .

Definition III.4 (Orthonormal Polynomials)

Consider a non-negative integrable weight function $w$ , the polynomials $\{q_{i}\}_{i\geq 0}$ , where $q_{i}(x)$ has degree $i$ , in $C[a,b]$ such that

[TABLE]

where the inner product is relative to $w$ , are called orthonormal polynomials relative to $w$ .

Note that based on the above definitions, if the polynomials $\{q_{i}\}_{i\geq 0}$ are orthogonal (or orthonormal), then $q_{n}(x)$ is orthogonal to all polynomials of degree $\leq n-1$ , i.e., $\langle p_{n-1}(x),q_{n}(x)\rangle=0$ , for any polynomial $p_{n-1}\in C[a,b]$ with degree strictly less than $n$ . It’s also worth noting that for $w(x)=1,a=-1,b=1$ , the orthogonal polynomials are Legendre polynomials, which are derived via Gram-Schmidt procedure applied to $\{1,x,x^{2},\ldots,\}$ sequentially. In addition, the following is an important class of orthogonal polynomials in our paper.

Example III.1 (Chebyshev polynomials of the first kind)

The following recurrence relation defines the Chebyshev polynomials of the first kind:

[TABLE]

where, $T_{0}(x)=1,T_{1}(x)=x$ . These Chebyshev polynomials are the corner stone of modern numerical approximation theory and practice with applications to numerical integration, and least-square approximations of continuous functions [23],[28]. $\frac{1}{\sqrt{2}}T_{0},T_{1},T_{2},\cdots$ are orthonormal relative to the weight function $\frac{2}{\pi\sqrt{1-x^{2}}}$ . In general, Chebyshev polynomials are defined over $x\in\mathbb{R}$ . However, for $x\in[-1,1]$ , $T_{n}(x)=\cos(n\arccos(x))$ , for any $n\in\mathbb{N}$ . For the rest of this paper, unless otherwise is stated, whenever Chebyshev polynomials are used, they are restricted only to the range $[-1,1]$ .

We state, next, two results from [28] in Theorems III.1 and III.2.

Theorem III.1

Let $w$ be a weight function on the range $[a,b]$ , i.e., $w$ is a non-negative integrable function on $[a,b]$ , and let $x_{1},\cdots,x_{n}$ be distinct real numbers such that $a<x_{1}<\cdots<x_{n}<b$ , there exist unique weights $a_{1},\cdots,a_{n}$ such that

[TABLE]

for all polynomials $f$ with degree less than $n$ .

Theorem III.1 is not surprising - the left hand side of the equation stated in the theorem is a linear operator on the vector space of $n-1$ -degree polynomials. Because of Lagrange-interpolation, the space of $n-1$ -degree polynomials is itself a linear transformation on its evaluation at $n$ points. Therefore, the left hand side can be expressed as an inner product of the functions evaluations at $n$ points. We next state a remarkable result by Gauss which states conditions under which the expression of Theorem III.1 is exact for polynomials of degree up to $2n-1,$ even though the number of evaluation points is just $n$ .

Theorem III.2 (Gauss Quadrature)

Fix a weight function $w$ , and let $\{q_{i}\}_{i\geq 0}$ be a set of orthonormal polynomials in $C[a,b]$ relative to $w$ . Given $n$ , let $\eta_{1},\cdots,\eta_{n}$ be the roots of $q_{n}$ such that $a\leq\eta_{1}<\eta_{2}<\cdots<\eta_{n}\leq b$ , and choose real values $a_{1},\cdots,a_{n}$ such that $\sum_{i=1}^{n}a_{i}f(\eta_{i})=\int_{a}^{b}f(x)w(x)dx$ , for any $f\in C[a,b]$ with degree less than $n$ . Then, $\sum_{i=1}^{n}a_{i}f(\eta_{i})=\int_{a}^{b}f(x)w(x)dx$ , for any polynomial $f$ with degree less than $2n$ .

Remark III.1

Consider any orthonormal polynomials $\{q_{i}\}_{i>0}$ . For any $n\in\mathbb{N}$ , the set $\{q_{0},q_{1},\cdots,q_{n-1}\}$ forms a basis for the vector space of polynomials with degree less than $n$ . 2. 2.

In Theorem III.2, $a_{1},\cdots,a_{n}$ can be chosen as

[TABLE] 3. 3.

In Theorem III.2, the roots of $q_{n}$ , i.e., $\eta_{1},\cdots,\eta_{n}$ are, in fact, real and distinct. Moreover, the Chebyshev polynomial of the first kind $T_{n}$ has the following roots

[TABLE]

The set $\{\rho^{(n)}_{1},\cdots,\rho^{(n)}_{n}\}$ is often called the $n$ -point Chebyshev grid, and its elements $\rho^{(n)}_{1},\cdots,\rho^{(n)}_{n}$ are called “Chebyshev nodes” of degree $n$ . We here discard the term “node” and use the term “Chebyshev points” to avoid confusion with computation nodes. We also denote by $\bm{\rho}^{(n)}$ the vector $(\rho_{1}^{(n)},\cdots,\rho_{n}^{(n)})$ . It is useful to note that $T_{n}(x)$ can be written as

[TABLE]

and for $T_{n}(x)$ , the weights $a_{i}$ in (9) are all equal to ${2}/{n}$ when $w(x)=\frac{2}{\pi\sqrt{1-x^{2}}}.$

III-B Notations

Throughout this paper, we use lowercase bold letters to denote vectors and uppercase bold letters to denote matrices. In addition, for any positive integers $k,n$ , and given a set of orthogonal polynomials $q_{0},q_{1},\cdots,q_{k-1}$ on the interval $[a,b]$ , let $\mathbf{x}=(x_{1},\cdots,x_{n})$ be a vector with entries in $[a,b]$ , we define the $k\times n$ matrix $\mathbf{Q}^{(k,n)}(\mathbf{x})$ as:

[TABLE]

For any subset $\mathcal{S}=\{s_{1},\cdots,s_{r}\}\subset[n]$ , we denote by $\mathbf{Q}^{(k,n)}_{\mathcal{S}}(\mathbf{x})$ the sub-matrix of $\mathbf{Q}^{(k,n)}(\mathbf{x})$ formed by concatenating columns with indices in $\mathcal{S}$ , i.e.,

[TABLE]

For the special case where the orthogonal polynomials are the Chebyshev polynomials of the first kind $T_{0},T_{1},\cdots,T_{k-1}$ , we define the $k\times n$ matrix $\mathbf{G}^{(k,n)}(\mathbf{x})$ as:

[TABLE]

we denote by $\mathbf{G}^{(k,n)}_{\mathcal{S}}(\mathbf{x})$ the sub-matrix of $\mathbf{G}^{(k,n)}(\mathbf{x})$ formed by concatenating columns with indices in $\mathcal{S}$ , i.e.,

[TABLE]

Also, for the case where the orthogonal polynomials are the “orthonormal” Chebyshev polynomials $\frac{1}{\sqrt{2}}T_{0},T_{1},\cdots,T_{k-1}$ , we define the $k\times n$ matrix $\tilde{\mathbf{G}}^{(k,n)}(\mathbf{x})$ as:

[TABLE]

and we denote by $\tilde{\mathbf{G}}^{(k,n)}_{\mathcal{S}}(\mathbf{x})$ the sub-matrix of $\tilde{\mathbf{G}}^{(k,n)}(\mathbf{x})$ formed by concatenating columns with indices in $\mathcal{S}$ , i.e.,

[TABLE]

Wherever there is no ambiguity on $\mathbf{x}$ , it may be dropped from the notation.

In the next section, we show that orthonormal polynomials can be used for designing codes for the distributed large scale matrix multiplication problem.

IV OrthoMatDot: Orthonormal Polynomials based Codes for Distributed Matrix Multiplication

In this section, we present a new orthonormal polynomials based class of codes for matrix-multiplication called OrthoMatDot. These codes achieve the same recovery threshold as MatDot Codes, and have similar computational complexity as MatDot. The main advantage of the proposed codes is that they avoid dealing with the ill-conditioned monomial basis used in previous work (e.g., in [3, 2, 5, 26]). In Section V, OrthoMatDot Codes will be specialized and demonstrated to have higher numerical stability as compared with state of the art. We begin with a formal problem formulation in Section IV-A, and describe our codes in Section IV-B.

IV-A System Model and Problem Formulation

IV-A1 System Model

We consider the distributed framework depicted in Fig. 6 that consists of a master node, $P$ worker nodes, and a fusion node where the only communication allowed is from the master node to the different worker nodes and from the worker nodes to the fusion node. It can happen that the fusion node and the master node be represented by the same node. In this case, the only communication allowed is the communication between the master node and every worker node.

IV-A2 Problem Formulation

The master node possesses two real-valued input matrices $\mathbf{A}$ , $\mathbf{B}$ with dimensions $N_{1}\times N_{2}$ , $N_{2}\times N_{3}$ , respectively. Every worker node receives from the master node an encoded matrix of $\mathbf{A}$ of dimension $N_{1}\times N_{2}/m$ and an encoded matrix of $\mathbf{B}$ of dimension $N_{2}/m\times N_{3}$ , and performs matrix multiplication of these two received inputs. Upon performing the matrix multiplication, each worker node sends the result to the fusion node. The fusion node needs to recover the matrix multiplication $\mathbf{A}\mathbf{B}$ once it receives the results of any $K$ worker nodes, where $K\leq P$ . In this case, $K$ is denoted by the recovery threshold of the distributed computing scheme.

IV-B OrthoMatDot Code Construction

Our result regarding the existence of achievable codes solving the distributed matrix multiplication problem using orthonormal polynomials is stated in the following theorem.

Theorem IV.1

For the matrix multiplication problem described in Section IV-A2 computed on the system defined in Section IV-A1, a recovery threshold of $2m-1$ is achievable using any set of orthonormal polynomials $\{q_{i}\}_{i\geq 0}$ relative to some weight polynomial $w$ and defined on a range $[a,b]$ .

Before proving this theorem, we first present OrthoMatDot, a code construction that achieves the recovery threshold of $2m-1$ given any set $\{q_{i}\}_{i\geq 0}$ of orthonormal polynomials relative to a weight polynomial $w(x)$ and defined on a range $[a,b]$ . In our code construction, we assume that matrix $\mathbf{A}$ is split vertically into $m$ equal sub-matrices, of dimension $N_{1}\times N_{2}/m$ each, and matrix $\mathbf{B}$ is split horizontally into $m$ equal sub-matrices, of dimension $N_{2}/m\times N_{3}$ each, as follows:

[TABLE]

we also define a set of $P$ distinct real numbers $x_{1},\cdots,x_{P}$ in the range $[a,b]$ , and define two encoding polynomials $p_{\mathbf{A}}(x)=\sum_{i=0}^{m-1}\mathbf{A}_{i}q_{i}(x)$ and $p_{\mathbf{B}}(x)=\sum_{i=0}^{m-1}\mathbf{B}_{i}q_{i}(x),$ and let $p_{\mathbf{C}}(x)=p_{\mathbf{A}}(x)p_{\mathbf{B}}(x)$ .

In the following, we briefly describe the OrthoMatDot construction. First, for every $r\in[P]$ , the master node sends to the $r$ -th worker node evaluations of $p_{\mathbf{A}}(x),p_{\mathbf{B}}(x)$ at $x=x_{r}$ , that is, it sends $p_{\mathbf{A}}(x_{r})$ and $p_{\mathbf{B}}(x_{r})$ to the $r$ -th worker node. Next, for every $r\in[P]$ , the $r$ -th worker node computes the matrix product $p_{\mathbf{C}}(x_{r})=p_{\mathbf{A}}(x_{r})p_{\mathbf{B}}(x_{r})$ and sends the result to the fusion node. Once the fusion node receives the output of any $2m-1$ worker nodes, it interpolates the polynomial $p_{\mathbf{C}}(x)=p_{\mathbf{A}}(x)p_{\mathbf{B}}(x)$ , and evaluates $p_{\mathbf{C}}(x)$ at $\eta_{1},\cdots,\eta_{m}$ , where $\eta_{1},\cdots,\eta_{m}$ are the roots of $q_{m}$ . Then, it performs the summation $\sum_{r=1}^{m}a_{r}\hskip 1.42262ptp_{\mathbf{C}}(\eta_{r})$ , where $a_{1},\cdots,a_{m}$ are as in (9).

We formally present OrthoMatDot code in Construction 1. Construction 1 uses the following notation. The output of the algorithm is the $N_{1}\times N_{3}$ matrix $\hat{\mathbf{C}}.$ The $(i,j)$ -th entries of the matrix polynomial $p_{\mathbf{C}}(x)$ and the matrix $\hat{\mathbf{C}}$ are respectively denoted as $p^{(i,j)}_{\mathbf{C}}(x)$ and $\hat{C}(i,j).$ The reader may also recall the definition of matrices $\mathbf{Q}^{(2m-1,P)}(\mathbf{x})$ and $\mathbf{Q}_{\mathcal{R}}^{(2m-1,P)}(\mathbf{x}),$ for any subset $\mathcal{R}=\{{r_{1}},\cdots,{r_{2m-1}}\}\subset[P]$ . $\bm{\eta}=(\eta_{1},\cdots,\eta_{m})$ is the vector of the roots of $q_{m}$ . Based on Construction 1, we state the following claim.

Claim IV.2

$\mathbf{A}\mathbf{B}=\sum_{r=1}^{m}a_{r}\hskip 1.42262ptp_{\mathbf{C}}(\eta_{r}).$ **

The proof of Claim IV.2 is provided in Appendix A.

Now, we can prove Theorem IV.1.

Proof:

In order to prove the theorem, it suffices to show that Construction 1 is a valid construction with a recovery threshold of $2m-1$ . Therefore, in the following, we prove that Construction 1 can recover $\mathbf{A}\mathbf{B}$ after the fusion node receives the output of at most $2m-1$ worker nodes. Assume that the fusion node has already received the results of any $2m-1$ worker nodes. Now, because the polynomial $p_{\mathbf{C}}(x)$ has degree $2m-2$ , the evaluations of $p_{\mathbf{C}}(x)$ at any $2m-1$ distinct points is sufficient to interpolate the polynomial, and since $x_{1},\cdots,x_{P}$ are distinct, the fusion node can interpolate $p_{\mathbf{C}}(x)$ once it receives the output of any $2m-1$ worker nodes. Afterwards, given that $\mathbf{A}\mathbf{B}=\sum_{r=1}^{m}a_{r}\hskip 1.42262ptp_{\mathbf{C}}(\eta_{r})$ (Claim IV.2), the fusion node can evaluate $p_{\mathbf{C}}(\eta_{1}),\cdots,p_{\mathbf{C}}(\eta_{m})$ and perform the scaled summation $\sum_{r=1}^{m}a_{r}\hskip 1.42262ptp_{\mathbf{C}}(\eta_{r})$ to recover $\mathbf{A}\mathbf{B}$ . ∎

Remark IV.1

*In Construction 1, setting $x_{1},\cdots,x_{m}$ to be the roots of $q_{m}$ leads to a faster decoding for the scenarios in which the first $m$ worker nodes send their results but only less than $2m-1$ workers succeed to send their outputs. For such scenarios, we have $\sum_{r=1}^{m}a_{r}\hskip 1.42262ptp_{\mathbf{C}}(x_{r})=$ $\sum_{r=1}^{m}a_{r}\hskip 1.42262ptp_{\mathbf{C}}(\eta_{r})=\mathbf{A}\mathbf{B}$ , where the last equality follows from Claim IV.2. *

Next, we study the computational and communication costs of OrthoMatDot.

IV-B1 Complexity Analyses of OrthoMatDot

Encoding Complexity: Encoding for each worker requires performing two additions, each adding $m$ scaled matrices of size $N_{1}N_{2}/m$ and $N_{2}N_{3}/m$ , for an overall encoding complexity for each worker of $O(N_{1}N_{2}+N_{2}N_{3})$ . Therefore, the overall computational complexity of encoding for $P$ workers is $O(N_{1}N_{2}P+N_{2}N_{3}P)$ .

Computational Cost per Worker: Each worker multiplies two matrices of dimensions $N_{1}\times N_{2}/m$ and $N_{2}/m\times N_{3}$ , requiring $O(N_{1}N_{2}N_{3}/m)$ operations.

Decoding Complexity: Since $p_{\mathbf{C}}(x)$ has degree $2m-2$ , the interpolation of $p_{\mathbf{C}}(x)$ requires the inversion of a $2m-1\times 2m-1$ matrix, with complexity $O(m^{3})$ , and performing $N_{1}N_{3}$ matrix-vector multiplications, each of them is between the inverted matrix and a column vector of length $2m-1$ of the received evaluations of the matrix polynomial $p_{\mathbf{C}}(x)$ at some position $(i,j)\in[N_{1}]\times[N_{3}]$ , with complexity $O(N_{1}N_{3}m^{2})$ . Next, the evaluation of the polynomial $p_{\mathbf{C}}(x)$ at $\eta_{1},\cdots,\eta_{m}$ requires a complexity of $O(N_{1}N_{3}m^{2})$ . Finally, performing the summation $\sum_{r=1}^{m}a_{r}p_{\mathbf{C}}(\eta_{r})$ requires a complexity of $O(N_{1}N_{3}m)$ . Thus, assuming that $m\ll N_{1},N_{3}$ , the overall decoding complexity is $O(m^{3}+2N_{1}N_{3}m^{2}+N_{1}N_{3}m)=O(N_{1}N_{3}m^{2})$ .

Communication Cost: The master node sends $O(N_{1}N_{2}P/m+N_{2}N_{3}P/m)$ symbols, and the fusion node receives $O(N_{1}N_{3}m)$ symbols from the successful worker nodes.

Remark IV.2

With the reasonable assumption that the dimensions of the input matrices $\mathbf{A},\mathbf{B}$ are large enough such that $N_{1},N_{2},N_{3}\gg m,P$ , we can conclude that the encoding and decoding costs at the master and fusion nodes, respectively, are negligible compared to the computation cost at each worker node.

V Numerically Stable Codes for Matrix Multiplication via OrthoMatDot Codes with Chebyshev Polynomials

In this section, we specialize OrthoMatDot Codes by restricting the orthonormal polynomials to be Chebyshev polynomials of the first kind $\{T_{i}\}_{i\geq 0}$ with the evaluation points chosen to be the $P$ -dimensional Chebyshev grid, i.e., $x_{i}=\rho_{i}^{(P)},i\in[P]$ . Our specialized OrthoMatDot, described in Construction 2 in Section V-A, develops a decoding that involves inversion of a $2m-1\times 2m-1$ sub-matrix of a $2m-1\times P$ Chebyshev-Vandermonde matrix. One of the main technical results of this section (and paper), presented in Theorem V.1 in Section V-B, is an upper bound to the worst case condition number over all possible $2m-1\times 2m-1$ sub-matrices of the $2m-1\times P$ Chebeshev-Vandermonde matrix for the case where the distinct evaluation points $x_{1},\cdots,x_{P}$ are chosen as the Chebyshev points of degree $P$ , i.e., $x_{i}=\rho_{i}^{(P)},i\in[P]$ . In fact, the derived bound shows that the worst case condition number grows at most polynomially in $P$ at a fixed number of straggler/parity worker nodes. This is in contrast with the monomial basis codes where the condition number grows exponentially in $P$ , even when there is no redundancy [16, 17, 18, 19]. We show through numerical experiments in Section V-C that our proposed codes provide significantly lower numerical errors as compared to MatDot Codes in [3].

V-A Chebyshev Polynomials based OrthoMatDot Code Construction

Recalling from Example III.1 that $\frac{1}{\sqrt{2}}T_{0},T_{1},T_{2},\cdots$ form an orthonormal polynomial set relative to the weight function $w(x)=\frac{2}{\pi\sqrt{1-x^{2}}}$ , in Construction 2, we explain the application of Chebyshev polynomials of the first kind to Construction 1. Note that, in Construction 2, we assume that the input matrices $\mathbf{A}$ and $\mathbf{B}$ are also split as in (38), and let $x_{1},x_{2},\ldots,x_{P}$ be distinct real numbers in the range $[-1,1]$ , and define the encoding functions $p_{\mathbf{A}}(x),p_{\mathbf{B}}(x)$ as $p_{\mathbf{A}}(x)=\frac{1}{\sqrt{2}}\mathbf{A}_{0}T_{0}(x)+\sum_{i=1}^{m-1}\mathbf{A}_{i}T_{i}(x)$ and $p_{\mathbf{B}}(x)=\frac{1}{\sqrt{2}}\mathbf{B}_{0}T_{0}(x)+\sum_{i=1}^{m-1}\mathbf{B}_{i}T_{i}(x),$ and let $p_{\mathbf{C}}(x)=p_{\mathbf{A}}(x)p_{\mathbf{B}}(x)$ .

The idea of our Chebyshev polynomials based OrthoMatDot code is as follows: First, for every $r\in[P]$ , the master node sends to the $r$ -th worker node $p_{\mathbf{A}}(\rho^{(P)}_{r})$ and $p_{\mathbf{B}}(\rho^{(P)}_{r})$ . Next, for every $r\in[P]$ , the $r$ -th worker node computes the matrix product $p_{\mathbf{C}}(\rho^{(P)}_{r})=p_{\mathbf{A}}(\rho^{(P)}_{r})p_{\mathbf{B}}(\rho^{(P)}_{r})$ and sends the result to the fusion node. Once the fusion node receives the output of any $2m-1$ worker nodes, it interpolates $p_{\mathbf{C}}(x)$ . Then, it evaluates $p_{\mathbf{C}}(x)$ at $\rho^{(m)}_{1},\cdots,\rho^{(m)}_{m},$ where $\rho^{(m)}_{i}$ ’s are as defined in (10), and computes $\sum_{i=1}^{m}a_{i}\hskip 1.42262ptp_{\mathbf{C}}(\rho^{(m)}_{i})$ , where $a_{i}={2}/{m},i\in[m]$ based on 3) in Remark III.1.

A formal description of our Chebyshev polynomials based OrthoMatDot code is provided in Construction 2. Construction 2 uses the following notation. We let the $(i,j)$ -th entry of the matrix polynomial $p_{\mathbf{C}}(x)$ be denoted $p^{(i,j)}_{\mathbf{C}}(x)$ and written as $p^{(i,j)}_{\mathbf{C}}(x)=\frac{1}{\sqrt{2}}c_{0}^{(i,j)}T_{0}(x)+\sum_{l=1}^{2m-2}c_{l}^{(i,j)}T_{l}(x)$ . Also, following the notation in Section III-B, we define the Chebyshev-Vandermonde matrices $\tilde{\mathbf{G}}^{(2m-1,P)}(\bm{\rho}^{(P)}),$ and $\tilde{\mathbf{G}}_{\mathcal{R}}^{(2m-1,P)}(\bm{\rho}^{(P)})$ , for any subset $\mathcal{R}=\{{r_{1}},\cdots,{r_{2m-1}}\}\subset[P]$ , we also define the matrix $\tilde{\mathbf{G}}^{(2m-1,m)}(\bm{\rho}^{(m)})$ . Finally, we assume that our construction returns an $N_{1}\times N_{3}$ matrix $\hat{\mathbf{C}}$ representing the result of the product $\mathbf{A}\mathbf{B}$ , where the $(i,j)$ -th entry of $\hat{\mathbf{C}}$ is $\hat{C}(i,j)$ .

V-A1 Complexity Analyses:

The different encoding complexity, computational complexity per worker, decoding complexity and communication cost for Chebyshev polynomials based OrthoMatDot are the same as their counterparts of OrthoMatDot stated in Section IV-B1.

V-B Evaluation Points and Condition Number Bound

When there is no redundancy, i.e., $n=2m-1,$ it is well known that the $n\times n$ decoding matrix $\mathbf{G}^{(n,n)}$ has condition number $n$ with the $\ell_{2}$ as well as the Frobenius norms [17]. Note the remarkable contrast with the Vandermonde matrix, whose condition number for real-valued evaluation points grows exponentially in $n$ , no matter how the nodes are chosen [16, 17]. Our problem differs from the standard problem in numerical methods, since we have to choose a rectangular “generator” matrix where every square sub-matrix is well-conditioned. In particular, even for Chebyshev-Vandermonde matrix, if the evaluation points are not chosen carefully, they are poorly conditioned [19] (also see Fig. 8). Here, we show that choosing $x_{i}=\rho_{i}^{(n)}$ leads to a well-conditioned system with $s$ redundant nodes. Our goal is to choose vector $\mathbf{x}$ such that $\kappa^{max}(\mathbf{G}^{(n-s,n)}(\mathbf{x}))$ is sufficiently small, where $\kappa^{max}(\mathbf{G}^{(n-s,n)}(\mathbf{x}))$ denotes the worst case condition number over all possible $n-s\times n-s$ sub-matrices of $\mathbf{G}^{(n-s,n)}(\mathbf{x})$ .

Theorem V.1

For any $s\in[n-1]$ ,

[TABLE]

where $\kappa^{max}_{F}$ denotes the worst case condition number over all possible $n-s\times n-s$ sub-matrices of $\mathbf{G}^{(n-s,n)}(\mathbf{x})$ with respect to the Frobenius norm, $\bm{\rho}^{(n)}=(\rho_{1}^{(n)},\rho_{2}^{(n)},\ldots,\rho_{n}^{(n)})$ are the roots of the Chebyshev polynomial $T_{n}$ , i.e., $\rho^{(n)}_{i}=\cos\left(\frac{2i-1}{2n}\pi\right),i\in[n]$ .

Since $||.||_{2}\leq||.||_{F},$ the above bound applies to the standard $\ell_{2}$ matrix norm as well. The proof uses techniques from numerical methods, and is provided in Appendix B.

Remark V.1

Although the bound in Theorem V.1 is derived for $\mathbf{G}^{(n-s,n)}(\bm{\rho}^{(n)})$ , the theorem also applies for $\tilde{\mathbf{G}}^{(n-s,n)}(\bm{\rho}^{(n)})$ . This is because it can be shown using simple matrix operations that for any $\tilde{\mathbf{G}}^{(n-s,n)}_{\mathcal{R}}$ , for a subset $\mathcal{R}\subset[n]$ such that $|\mathcal{R}|=n-s$ , $\kappa_{F}(\tilde{\mathbf{G}}^{(n-s,n)}_{\mathcal{R}})<\sqrt{2}\hskip 2.84526pt\kappa_{F}(\mathbf{G}^{(n-s,n)}_{\mathcal{R}}).$

V-C Numerical Results

The numerical stability of our codes is determined by the condition number of $2m-1\times 2m-1$ sub-matrices of $\mathbf{G}^{(2m-1,P)}.$ The natural comparison is with MatDot Codes where the decoding depends on effectively inverting $2m-1\times 2m-1$ square sub-matrices of

[TABLE]

Based on the result of Theorem V.1, we choose $x_{i}=\rho_{i}^{(P)}.$ In our experiments, we consider systems with various number of worker nodes, namely, $P=16,30,60,80,100$ . We compare $\kappa^{max}_{2}(\mathbf{G}^{(2m-1,P)})$ with $\kappa^{max}_{2}(\mathbf{M})$ . We also compare the average $\ell_{2}$ condition number of all $2m-1\times 2m-1$ sub-matrices of $\mathbf{G}^{(2m-1,P)}$ and all $2m-1\times 2m-1$ sub-matrices of $\mathbf{M}$ . The results, in Fig. 7, show that, for every examined system, the maximum and average condition numbers of the $2m-1\times 2m-1$ sub-matrices of $\mathbf{G}^{(2m-1,P)}$ are less than its MatDot Codes counterparts, especially for larger systems with $60,80,$ and $100$ worker nodes. In fact, for these specific systems, the improvement in the condition number is around a scaling of $10^{15}$ .

Fig. 8 shows how the maximum/average condition number of the $2m-1\times 2m-1$ sub-matrices of $\mathbf{G}^{(2m-1,P)}$ grows with the size of the distributed system given a fixed number of redundant worker nodes, namely 1 and 3, and compares with MatDot Codes. The figure shows that while MatDot Codes provide a reasonable condition number $(\sim 10^{10})$ to distributed systems with size up to only $25$ worker nodes, Construction 2 can afford distributed systems with size up to $150$ worker nodes for the same condition number bound $\sim 10^{10}$ .

As a reflection to the significant higher stability of Chebyshev polynomials based OrthoMatDot compared to MatDot Codes, Fig. 9 shows that Chebyshev polynomials based OrthoMatDot provides much more accurate outputs compared to MatDot Codes. For the experiments whose results are shown in Fig. 9, the entries of the input matrices $\mathbf{A},\mathbf{B}$ are chosen independently according to the standard Gaussian distribution $\mathcal{N}(0,1)$ . In addition, for any two input matrices $\mathbf{A},\mathbf{B}$ , let $\hat{\mathbf{C}}$ be the output of the distributed system (which is not necessarily equal to the correct answer $\mathbf{A}\mathbf{B}$ ), we define the relative error between $\mathbf{A}\mathbf{B}$ and $\hat{\mathbf{C}}$ to be

[TABLE]

Fig. 9 shows how the maximum relative error (the worst case relative error given a fixed number of parity workers $s$ among all the $P-s$ successful nodes scenarios) grows with the size of the distributed system. In Fig. 9, we plot the average result of five different realizations of the system at each system size $P$ . The figure shows that MatDot Codes crushes after the size of the system exceeds $50$ workers, providing a relative error of around $10^{5}$ . On the other hand, our OrthoMatDot construction can support systems with sizes up to $150$ worker nodes only allowing for a relative error $<10^{-5}$ . It is also worth mentioning that in our experiments, we use the MATLAB command $inv()$ [29] for matrix inversion. We have also tried matrices inversion through the Bjork-Pereyra algorithm [30], however, its results were much less accurate than $inv()$ , especially for large systems with a number of worker nodes $>50$ .

Remark V.2

A main challenge in this work is that we assume operations over the real field. For finite fields, one can always perform arithmetic operations with no errors. Although this fact may motivate a simple solution to the numerical stability of real-valued computations by rounding the computation’s inputs to a finite field’s elements and performing computations over this finite field, such solution has limited applicability, especially for inputs with wide range, due to the following reason. Since performing arithmetic operations over a finite field $\mathbb{F}_{2^{n}}$ requires representing each element of $\mathbb{F}_{2^{n}}$ as an element in $\mathbb{F}_{2}^{n}$ through a bit representation, this solution is applicable in machines with fixed point operations and word sizes of at least $n$ . However, the solution is not applicable in machines with floating point operations since in floating point representation not all the intermediate values between the minimum and the maximum representable values can be represented, this is a drawback of the floating point representation over the fixed point representation, though floating point representation can represent a wider range of values than fixed point representation for the same word size.

VI OrthoPoly: Low Communication/Computation Numerically Stable Codes for Distributed Marix Multiplication

While MatDot Codes [3] have an optimal recovery threshold of $2m-1$ , they have relatively higher computation cost per worker and worker node to fusion node communication cost as compared to Polynomial Codes [2]. In this section, motivated by the condition number bound in Theorem V.1, we use the idea of using Chebyshev polynomials to provide a numerically stable code construction for matrix multiplication that has the same low communication/computation costs as Polynomial Codes, as well as the same recovery threshold. However, as will be shown in this section, our proposed codes, denoted by OrthoPoly, provides lower numerical errors than Polynomial Codes. In this section, we follow the same system model as in Section IV-A1, and solve the problem statement formulated in Section VI-A. We provide a motivating example in Section VI-B, then we provide the general code construction in Section VI-C. Finally, in Section VI-D, we show experimentally that OrthoPoly Codes achieve lower numerical errors as compared to Polynomial Codes.

VI-A Problem Formulation

The master node possesses two real-valued input matrices $\mathbf{A}$ , $\mathbf{B}$ with dimensions $N_{1}\times N_{2}$ , $N_{2}\times N_{3}$ , respectively. Every worker node receives from the master node an encoded matrix of $\mathbf{A}$ of dimension $N_{1}/m\times N_{2}$ and an encoded matrix of $\mathbf{B}$ of dimension $N_{2}\times N_{3}/n$ , and performs matrix multiplication of these two received inputs. Upon performing the matrix multiplication, each worker node sends the result to the fusion node. The fusion node needs to recover the matrix multiplication $\mathbf{A}\mathbf{B}$ once it receives the results of any $mn$ worker nodes.

VI-B Example $(m=n=3)$

Consider computing the matrix multiplication $\mathbf{A}\mathbf{B}$ , for some two real matrices $\mathbf{A},\mathbf{B}$ of dimensions $N_{1}\times N_{2}$ and $N_{2}\times N_{3}$ , respectively, over a distributed system of $P\geq 9$ workers such that:

Each worker receives an encoded matrix of $\mathbf{A}$ of dimension $N_{1}/3\times N_{2}$ , and an encoded matrix of $\mathbf{B}$ of dimension $N_{2}\times N_{3}/3$ . 2. 2.

The product $\mathbf{A}\mathbf{B}$ can be recovered by the fusion node given the results of any $9$ worker nodes.

A solution can be as follows: First, matrices $\mathbf{A},\mathbf{B}$ can be partitioned as

[TABLE]

where, for any $i\in\{0,1,2\}$ , $\mathbf{A}_{i}$ has dimension $N_{1}/3\times N_{2}$ , and $\mathbf{B}_{i}$ has dimension $N_{2}\times N_{3}/3$ . Next, let

[TABLE]

Now, $p_{\mathbf{A}}(x)p_{\mathbf{B}}(x)$ can be written as

[TABLE]

Since $p_{\mathbf{A}}(x)p_{\mathbf{B}}(x)$ is a degree $8$ polynomial, once the fusion node receives the output of any $9$ workers, it can interpolate $p_{\mathbf{A}}(x)p_{\mathbf{B}}(x)$ , i.e., obtain its matrix coefficients, let such matrix coefficients be $\mathbf{C}_{T_{0}},\cdots,\mathbf{C}_{T_{8}}$ . Specifically, for any $i\in\{0,\cdots,8\}$ , let $\mathbf{C}_{T_{i}}$ be the matrix coefficient of $T_{i}$ in $p_{\mathbf{A}}(x)p_{\mathbf{B}}(x)$ . Now, recalling (49), the product $\mathbf{A}\mathbf{B}$ can be written as

[TABLE]

While the obtained set of matrix coefficients $\{\mathbf{C}_{T_{i}}:i\in\{0,\cdots,8\}\}$ is not equal to $\{\mathbf{A}_{i}\mathbf{B}_{j}:i,j\in\{0,1,2\}\}$ , $\mathbf{C}_{T_{i}}$ ’s are linear combinations of $\mathbf{A}_{i}\mathbf{B}_{j}$ ’s. Specifically, for any $\mathbf{C}_{T_{i}}$ , $i\in\{0,\cdots,8\}$ , let $\mathbf{C}_{T_{i}}^{(k,l)}$ be its $(k,l)$ -th entry, and, for any $i,j\in\{0,1,2\}$ , let $(\mathbf{A}_{i}\mathbf{B}_{j})^{(k,l)}$ be the $(k,l)$ -th entry of the product $\mathbf{A}_{i}\mathbf{B}_{j}$ , we can write

[TABLE]

for any $(k,l)\in[N_{1}/3]\times[N_{3}/3]$ . Thus, the products $\mathbf{A}_{i}\mathbf{B}_{j},i,j\in\{0,1,2\}$ can be obtained by computing

[TABLE]

for all $(k,l)\in[N_{1}/3]\times[N_{3}/3]$ . In the following, we provide the general code construction.

VI-C OrthoPoly Code Construction

We assume that matrix $\mathbf{A}$ is split horizontally into $m$ equal sub-matrices, of dimension $N_{1}/m\times N_{2}$ each, and matrix $\mathbf{B}$ is split vertically into $n$ equal sub-matrices, of dimension $N_{2}\times N_{3}/n$ each, as follows:

[TABLE]

and define two encoding polynomials $p_{\mathbf{A}}(x)=\sum_{i=0}^{m-1}\mathbf{A}_{i}T_{i}(x)$ and $p_{\mathbf{B}}(x)=\sum_{i=0}^{n-1}\mathbf{B}_{i}T_{im}(x),$ and let $p_{\mathbf{C}}(x)=p_{\mathbf{A}}(x)p_{\mathbf{B}}(x)$ . We describe, next, the idea of the general code construction. First, for all $r\in[P]$ , the master node sends to the $r$ -th worker evaluations of $p_{\mathbf{A}}(x)$ and $p_{\mathbf{B}}(x)$ at $x=\rho^{(P)}_{r}$ , that is, it sends $p_{\mathbf{A}}(\rho^{(P)}_{r})$ and $p_{\mathbf{B}}(\rho^{(P)}_{r})$ to the $r$ -th worker. Next, for every $r\in[P]$ , the $r$ -th worker node computes the matrix product $p_{\mathbf{C}}(\rho^{(P)}_{r})=p_{\mathbf{A}}(\rho^{(P)}_{r})p_{\mathbf{B}}(\rho^{(P)}_{r})$ and sends the result to the fusion node. Once the fusion node receives the output of any $mn$ worker nodes, it interpolates $p_{\mathbf{C}}(x)$ . Next, the fusion node recovers the products $\mathbf{A}_{i}\mathbf{B}_{j},i\in\{0,\cdots,m-1\},j\in\{0,\cdots,n-1\}$ , from the matrix coefficients of $p_{\mathbf{C}}(x)$ using a low complexity matrix-vector multiplication, specified later in Construction 3. We formally present our OrthoPoly Codes in Construction 3. In the following, we explain the notation used in Construction 3. The output of the algorithm is the $N_{1}\times N_{3}$ matrix $\hat{\mathbf{C}},$ where the $(k,l)$ -th block of $\hat{\mathbf{C}}$ is the $N_{1}/m\times N_{3}/n$ matrix $\hat{\mathbf{C}}_{k,l}$ , and the $(i,j)$ -th entry of any matrix $\hat{\mathbf{C}}_{k,l}$ is $\hat{{c}}_{k,l}^{(i,j)}$ . The $(i,j)$ -th entry of the matrix polynomial $p_{\mathbf{C}}(x)$ is denoted as $p^{(i,j)}_{\mathbf{C}}(x)$ , and Section III-B defines matrices $\mathbf{G}^{(mn,P)}(\bm{\rho}^{(P)})$ and $\mathbf{G}^{(mn,P)}_{\mathcal{R}}(\bm{\rho}^{(P)})$ , for any subset $\mathcal{R}=\{{r_{1}},\cdots,{r_{mn}}\}\subset[P]$ . In addition, $\mathbf{H}$ is an $mn\times mn$ matrix of the following form $\mathbf{H}=\left(\begin{array}[]{ccc}\mathbf{H}_{0}~{}~{}\mathbf{H}_{1}~{}~{}\cdots~{}~{}\mathbf{H}_{n-1}\end{array}\right),$ where $\mathbf{H}_{0}$ is an $mn\times m$ matrix with ones on the main diagonal and zeros elsewhere, and for any $i\in\{1,\cdots,n-1\}$ , $\mathbf{H}_{i}$ is an $mn\times m$ matrix of the following structure

[TABLE]

where the value $1$ in the first column is at the $(im+1)$ -th row of $\mathbf{H}_{i}$ .

VI-C1 Complexity Analyses of OrthoPoly

Encoding Complexity: Encoding for each worker requires performing two additions, the first one adds $m$ scaled matrices of size $N_{1}N_{2}/m$ and the other adds $n$ scaled matrices of size $N_{2}N_{3}/n$ , for an overall encoding complexity for each worker of $O(N_{1}N_{2}+N_{2}N_{3})$ . Therefore, the overall computational complexity of encoding for $P$ workers is $O(N_{1}N_{2}P+N_{2}N_{3}P)$ .

Computational Cost per Worker: Each worker multiplies two matrices of dimensions $N_{1}/m\times N_{2}$ and $N_{2}\times N_{3}/n$ , requiring $O(N_{1}N_{2}N_{3}/mn)$ operations.

Decoding Complexity: Since $p_{\mathbf{A}}(x)p_{\mathbf{B}}(x)$ has degree $mn-1$ , the interpolation of $p_{\mathbf{C}}(x)$ requires the inversion of a $mn\times mn$ matrix, with complexity $O(m^{3}n^{3})$ , and performing $N_{1}N_{3}/mn$ matrix-vector multiplications, each of them is between the inverted matrix and a column vector of length $mn$ of the received evaluations of the matrix polynomial $p_{\mathbf{C}}(x)$ at some position $(i,j)\in[N_{1}/m]\times[N_{3}/n]$ , with complexity $O(N_{1}N_{3}m^{2}n^{2}/(mn))=O(N_{1}N_{3}mn)$ . Thus, assuming that $mn\ll N_{1},N_{3}$ , the overall decoding complexity is $O(N_{1}N_{3}mn)$ .

Communication Cost: The master node sends $O(N_{1}N_{2}P/m+N_{2}N_{3}P/n)$ symbols, and the fusion node receives $O(N_{1}N_{3})$ symbols from the successful worker nodes.

Remark VI.1

With the reasonable assumption that the dimensions of the input matrices $\mathbf{A},\mathbf{B}$ are large enough such that $N_{1},N_{2},N_{3}\gg m,n,P$ , we can conclude that the encoding and decoding costs at the master and fusion nodes, respectively, are negligible compared to the computation cost at each worker node.

VI-D Numerical Results

In our experiments, the entries of the input matrices $\mathbf{A},\mathbf{B}$ are chosen independently according to the standard Gaussian distribution $\mathcal{N}(0,1)$ . In addition, for any two input matrices $\mathbf{A},\mathbf{B}$ , let $\hat{\mathbf{C}}$ be the output of the distributed system, we define the relative error between $\mathbf{A}\mathbf{B}$ and $\hat{\mathbf{C}}$ to be

[TABLE]

Fig. 10 shows how the maximum relative error (the worst case relative error given a fixed number of parity workers $s$ among all the $P-s$ successful nodes scenarios) grows with the size of the distributed system for both Construction 3 and Polynomial Codes. In Fig. 10, we plot the average result of five different realizations of the system at each system size $P$ . The figure shows that Polynomial Codes have unacceptable relative errors after the size of the system exceeds $50$ workers, providing a relative error of around $10^{5}$ . On the other hand, OrthoPoly can support systems with sizes up to $170$ worker nodes only allowing for a relative error $<10^{-5}$ .

VII Generalized OrthoMatDot: Numerically stable Codes for Matrix Multiplication with Communication/Computation-Recovery Threshold Trade-off

Although MatDot Codes [3] have a low recovery threshold of $2m-1$ as compared with Polynomial Codes [2] which have a recovery threshold of $mn$ , MatDot Codes’ worker to fusion nodes communication cost and computation cost per worker are higher than Polynomial Codes. Codes proposed in [4, 5, 26] offer a trade-off between the communication/computation cost and the recovery threshold. However, all of these codes are based on the “ill-conditioned” monomial basis. In this section, we offer a numerically stable code construction, denoted by Generalized OrthoMatDot, that offers a trade-off between communication/computation costs and recovery threshold. Our construction incurs a higher recovery threshold than the codes of [5, 26] by a factor of at most $4$ for the same communication/computation cost. We provide in Section VII-A the formal problem statement considered in this section. We describe an example of our construction in Section VII-B, provide the general code construction in Section VII-C, and describe our numerical experiments in Section VII-D.

VII-A System Model and Problem Formulation

We consider the same system model and problem formulation as in Section IV-A with the following change: We assume that the master node is allowed to send an encoded ${1}/{m}$ fraction of matrix $\mathbf{A}$ , and an encoded ${1}/{n}$ fraction of matrix $\mathbf{B}$ , where $m$ and $n$ are not necessarily equal, and $\mathbf{A}$ and $\mathbf{B}$ are split as follows

[TABLE]

where $m_{1},m_{2},m_{3}$ divide $N_{1},N_{2},N_{3}$ , respectively, and $m=m_{1}m_{2},n=m_{2}m_{3}$ . In addition, we assume that each worker node receives a linear combination of sub-matrices $\mathbf{A}_{i,j}$ , and another linear combination of sub-matrices $\mathbf{B}_{i,j}$ .

Remark VII.1

Although, in this section, we offer Generalized OrthoMatDot, a code construction with lower condition numbers than codes in [5, 26], the recovery threshold of our codes are higher by a factor of at most $4$ than the codes of these references. Specifically, Generalized OrthoMatDot codes have a recovery threshold of $4m_{1}m_{2}m_{3}-2(m_{1}m_{2}+m_{2}m_{3}+m_{3}m_{1})$ $+m_{1}+2m_{2}+m_{3}-1$ while both codes in [5, 26] have a recovery threshold of $m_{1}m_{2}m_{3}+m_{2}-1$ . This increased recovery threshold is due to the fact that Generalized OrthoMatDot Codes are based on Chebyshev polynomials which have the following property: For any $i,j\in\mathbb{N}$ , $T_{i}(x)T_{j}(x)=1/2~{}(T_{i+j}(x)+T_{|i-j|}(x))$ . This property allows for a higher number of undesired terms in the multiplication of the encoding polynomials $p_{\mathbf{A}}(x),p_{\mathbf{B}}(x)$ . In order to avoid combining undesired and desired terms at the same degree, higher degree Chebyshev polynomials have to be used in $p_{\mathbf{B}}(x)$ , yielding a higher recovery threshold. It is still an open question whether the recovery threshold in [5, 26] can be achieved using orthonormal polynomials.

VII-B Example $(m_{1}=m_{2}=m_{3}=2)$

Consider computing the matrix multiplication $\mathbf{A}\mathbf{B}$ , for some two real matrices $\mathbf{A},\mathbf{B}$ of dimensions $N_{1}\times N_{2}$ and $N_{2}\times N_{3}$ , respectively, over a distributed system of $P\geq 15$ workers such that:

Each worker receives an encoded matrix of $\mathbf{A}$ of dimension $N_{1}/2\times N_{2}/2$ , and an encoded matrix of $\mathbf{B}$ of dimension $N_{2}/2\times N_{3}/2$ . 2. 2.

The product $\mathbf{A}\mathbf{B}$ can be recovered by the fusion node given the results of any $15$ worker nodes.

A solution can be as follows: First, matrices $\mathbf{A},\mathbf{B}$ can be partitioned as

[TABLE]

where, for $i,j\in\{0,1\}$ , $\mathbf{A}_{i,j}$ has dimension $N_{1}/2\times N_{2}/2$ , and $\mathbf{B}_{i,j}$ has dimension $N_{2}/2\times N_{3}/2$ . Next, let

[TABLE]

where $\alpha,\beta$ to be specified next, and define $P$ distinct real numbers $x_{1},x_{2},\cdots,x_{P}$ in the range $[-1,1]$ . For each worker node $r\in[P]$ , the master node sends $p_{\mathbf{A}}(x_{r})p_{\mathbf{B}}(x_{r})$ .

Now, in order to specify the best values for $\alpha,\beta$ , we expand the polynomial $p_{\mathbf{A}}(x)p_{\mathbf{B}}(x)$ in the Chebyshev basis, and then point out some observations.

[TABLE]

Using the property of the Chebyshev polynomials that for any $i,j\in\mathbb{N}$ , $T_{i}(x)T_{j}(x)=1/2~{}(T_{i+j}(x)+T_{|i-j|}(x))$ , (VII-B) can be rewritten as

[TABLE]

Now, note the following regrading $p_{\mathbf{A}}(x)p_{\mathbf{B}}(x)$ in (VII-B):

(i)

$\frac{1}{2}\left(\mathbf{A}_{0,0}\mathbf{B}_{0,0}+\mathbf{A}_{0,1}\mathbf{B}_{1,0}\right)$ is the coefficient of $T_{1}(x)$ , 2. (ii)

$\frac{1}{2}\left(\mathbf{A}_{1,0}\mathbf{B}_{0,0}+\mathbf{A}_{1,1}\mathbf{B}_{1,0}\right)$ is the coefficient of $T_{\alpha+1}(x)$ , 3. (iii)

$\frac{1}{2}\left(\mathbf{A}_{0,0}\mathbf{B}_{0,1}+\mathbf{A}_{0,1}\mathbf{B}_{1,1}\right)$ is the coefficient of $T_{\beta+1}(x)$ , 4. (iv)

$\frac{1}{2}\left(\mathbf{A}_{1,0}\mathbf{B}_{0,1}+\mathbf{A}_{1,1}\mathbf{B}_{1,1}\right)$ is the coefficient of $T_{\beta+\alpha+1}(x)$ .

Since $p_{\mathbf{A}}(x)p_{\mathbf{B}}(x)$ has degree $\beta+\alpha+2$ , and this polynomial is evaluated at distinct value at each worker node, once the fusion node receives the output of any $\beta+\alpha+3$ worker nodes, it can interpolate $p_{\mathbf{A}}(x)p_{\mathbf{B}}(x)$ and extract the product $\mathbf{A}\mathbf{B}$ (i.e., the matrix coefficients of $T_{1}(x),T_{\alpha+1}(x)$ , $T_{\beta+1}(x),T_{\beta+\alpha+1}(x)$ ). Now, we aim for picking values for $\alpha,\beta$ such that the degree of $p_{\mathbf{A}}(x)p_{\mathbf{B}}(x)$ is minimal; and hence, the recovery threshold is minimal as well. These minimal values for $\alpha,\beta$ must be chosen such that the desired coefficients in (i)-(iv) are separate. That is, each of them is neither combined with another desired nor undesired term. This constraint leads to the following two inequalities:

[TABLE]

which implies that $\alpha=3,\beta=9$ . Next, we provide our general code construction for the Generalized OrthoMatDot Codes.

VII-C Generalized OrthoMatDot Code Construction

Theorem VII.1

For the matrix multiplication problem described in Section VII-A computed on the system defined in Section IV-A1, there exists a coding strategy with recovery threshold

[TABLE]

Notice that the problem specified in Section VII-A restricts the output matrix of each worker node to be of dimension $N_{1}/m_{1}\times N_{3}/m_{3}$ , for some positive integers $m_{1},m_{3}$ that divide $N_{1},N_{3}$ , respectively. This is smaller than the dimensions of the output matrix of each worker node according to the problem specified in Section IV-A2 (i.e., $N_{1}\times N_{3}$ ) by a factor of $m_{1}m_{3}$ . However, according to Theorem VII.1, this communication advantage, when $m_{1}>1$ or $m_{2}>1$ , comes at the expense of a higher recovery threshold compared to OrthoMatDot Codes.

Remark VII.2 (Notation)

For ease of exposition in the remaining of this section, we use $T^{{}^{\prime}}_{0},T^{{}^{\prime}}_{1},T^{{}^{\prime}}_{2},\cdots$ to denote $\frac{1}{2}T_{0},T_{1},T_{2},\cdots$ , respectively.

In order to prove Theorem VII.1, we first present a code construction that achieves the recovery threshold in (VII.1), then we prove that the presented code construction is valid. First, note that in the Generalized OrthoMatDot code construction, we assume that the two input matrices $\mathbf{A},\mathbf{B}$ are split as in (115). Also, note that given this partitioning of input matrices, we can write $\mathbf{C}=\mathbf{A}\mathbf{B}$ , where $\mathbf{C}$ is written as

[TABLE]

and each of $\mathbf{C}_{i,l}$ has dimension ${N_{1}}/{m_{1}}\times{N_{3}}/{m_{3}}$ and can be expressed as $\mathbf{C}_{i,l}=\sum_{j=0}^{m_{2}-1}\mathbf{A}_{i,j}\mathbf{B}_{j,l},$ for any $i\in\{0,1,\cdots,m_{1}-1\},$ and $l\in\{0,1,\cdots,m_{3}-1\}$ . Also, let $x_{1},\cdots,x_{P}$ be distinct real numbers in the range $[-1,1]$ , and define encoding polynomials

[TABLE]

and let $p_{\mathbf{C}}(x)=p_{\mathbf{A}}(x)p_{\mathbf{B}}(x)$ . Notice that $p_{\mathbf{C}}(x)$ is a polynomial matrix of degree equals $\deg_{\mathbf{C}}:=4m_{1}m_{2}m_{3}-2(m_{1}m_{2}+m_{2}m_{3}+m_{3}m_{1})$ $+m_{1}+2m_{2}+m_{3}-2$ .

Claim VII.2

For any $i\in\{0,1,\cdots,m_{1}-1\}$ and $l\in\{0,1,\cdots,m_{3}-1\}$ , $\frac{1}{2}\mathbf{C}_{i,l}$ is the matrix coefficient of $T_{m_{2}-1+i(2m_{2}-1)+l(2m_{1}-1)(2m_{2}-1)}$ in $p_{\mathbf{C}}(x)$ ,

The proof of this claim is in Appendix C.

We describe, next, the idea of our proposed Generalized OrthoMatDot code construction. First, for all $r\in[P]$ , the master node sends to the $r$ -th worker evaluations of $p_{\mathbf{A}}(x)$ and $p_{\mathbf{B}}(x)$ at $x=\rho^{(P)}_{r}$ , that is, it sends $p_{\mathbf{A}}(\rho^{(P)}_{r})$ and $p_{\mathbf{B}}(\rho^{(P)}_{r})$ to the $r$ -th worker. Next, for every $r\in[P]$ , the $r$ -th worker node computes the matrix product $p_{\mathbf{C}}(\rho^{(P)}_{r})=p_{\mathbf{A}}(\rho^{(P)}_{r})p_{\mathbf{B}}(\rho^{(P)}_{r})$ and sends the result to the fusion node. Once the fusion node receives the output of any $\deg_{\mathbf{C}}+1$ worker nodes, it interpolates $p_{\mathbf{C}}(x)$ .

We formally present our Generalized OrthoMatDot code construction in Construction 4. In the following, we explain the notation used in Construction 4. The output of the algorithm is the $N_{1}\times N_{3}$ matrix $\hat{\mathbf{C}},$ where the $(k,l)$ -th block of $\hat{\mathbf{C}}$ is the $N_{1}/m_{1}\times N_{3}/m_{3}$ matrix $\hat{\mathbf{C}}_{k,l}$ , and the $(i,j)$ -th entry of any matrix $\hat{\mathbf{C}}_{k,l}$ is $\hat{{c}}_{k,l}^{(i,j)}$ . The $(i,j)$ -th entry of the matrix polynomial $p_{\mathbf{C}}(x)$ is denoted as $p^{(i,j)}_{\mathbf{C}}(x)$ , and Section III-B defines matrices $\mathbf{G}^{(\deg_{\mathbf{C}}+1,P)}(\bm{\rho}^{(P)})$ and $\mathbf{G}^{(\deg_{\mathbf{C}}+1,P)}_{\mathcal{R}}(\bm{\rho}^{(P)})$ , for any subset $\mathcal{R}=\{{r_{1}},\cdots,{r_{\deg_{\mathbf{C}}+1}}\}\subset[P]$ .

Now, we prove Theorem VII.1.

Proof:

To prove the theorem, it suffices to prove that Construction 4 is valid. Noting that $p_{\mathbf{A}}(x)p_{\mathbf{B}}(x)$ has degree $4m_{1}m_{2}m_{3}-2(m_{1}m_{2}+m_{2}m_{3}+m_{3}m_{1})+m_{1}+2m_{2}+m_{3}-2$ and every worker node sends an evaluation of $p_{\mathbf{A}}(x)p_{\mathbf{B}}(x)$ at a distinct point, once the fusion node receives the output of any $4m_{1}m_{2}m_{3}-2(m_{1}m_{2}+m_{2}m_{3}+m_{3}m_{1})+m_{1}+2m_{2}+m_{3}-1$ worker node, it can interpolate $p_{\mathbf{A}}(x)p_{\mathbf{B}}(x)$ (i.e., obtain all its matrix coefficients). This includes the coefficients of $T_{m_{2}-1+i(2m_{2}-1)+l(2m_{1}-1)(2m_{2}-1)}$ for all $i\in\{0,1,\cdots,m_{1}-1\},$ and $l\in\{0,1,\cdots,m_{3}-1\}$ , i.e., $\mathbf{C}_{i,l}$ , for all $i\in\{0,1,\cdots,m_{1}-1\},$ and $l\in\{0,1,\cdots,m_{3}-1\}$ (Claim VII.2), which completes the proof. ∎

Next, we provide the different complexity analyses of the Generalized OrthoMatDot Codes.

VII-C1 Complexity Analyses of Generalized OrthoMatDot

Encoding Complexity: Encoding for each worker requires performing two additions, the first one adds $m_{1}m_{2}$ scaled matrices of size $N_{1}N_{2}/(m_{1}m_{2})$ and the other adds $m_{2}m_{3}$ scaled matrices of size $N_{2}N_{3}/(m_{2}m_{3})$ , for an overall encoding complexity for each worker of $O(N_{1}N_{2}+N_{2}N_{3})$ . Therefore, the overall computational complexity of encoding for $P$ workers is $O(N_{1}N_{2}P+N_{2}N_{3}P)$ .

Computational Cost per Worker: Each worker multiplies two matrices of dimensions $N_{1}/m_{1}\times N_{2}/m_{2}$ and $N_{2}/m_{2}\times N_{3}/m_{3}$ , requiring $O(N_{1}N_{2}N_{3}/(m_{1}m_{2}m_{3}))$ operations.

Decoding Complexity: Since $p_{\mathbf{A}}(x)p_{\mathbf{B}}(x)$ has degree $k-1:=4m_{1}m_{2}m_{3}-2(m_{1}m_{2}+m_{2}m_{3}+m_{3}m_{1})+m_{1}+2m_{2}+m_{3}-2$ , the interpolation of $p_{\mathbf{C}}(x)$ requires the inversion of a $k\times k$ matrix, with complexity $O(k^{3})=O(m_{1}^{3}m_{2}^{3}m_{3}^{3})$ , and performing $N_{1}N_{3}/(m_{1}m_{3})$ matrix-vector multiplications, each of them is between the inverted matrix and a column vector of length $k$ of the received evaluations of the matrix polynomial $p_{\mathbf{C}}(x)$ at some position $(i,j)\in[N_{1}/m_{1}]\times[N_{3}/m_{3}]$ , with complexity $O(N_{1}N_{3}k^{2}/(m_{1}m_{3}))=O(N_{1}N_{3}m_{1}m_{2}^{2}m_{3})$ . Thus, assuming that $m_{1},m_{3}\ll N_{1},N_{3}$ , the overall decoding complexity is $O(N_{1}N_{3}m_{1}m_{2}^{2}m_{3})=O(N_{1}N_{3}mn)$ .

Communication Cost: The master node sends $O(N_{1}N_{2}P/(m_{1}m_{2})+N_{2}N_{3}P/(m_{2}m_{3}))$ symbols, and the fusion node receives $O(N_{1}N_{3}m_{2})$ symbols from the successful worker nodes.

Remark VII.3

With the reasonable assumption that the dimensions of the input matrices $\mathbf{A},\mathbf{B}$ are large enough such that $N_{1},N_{2},N_{3}\gg m_{1},m_{2},m_{3},P$ , we can conclude that the encoding and decoding costs at the master and fusion nodes, respectively, are negligible compared to the computation cost at each worker node.

VII-D Numerical Results

In our experiments on Construction 4, we considered distributed systems with $P=16,25$ worker nodes. Fig. 11 shows that, for every examined system, the condition number of the interpolation matrix using the Generalized OrthoMatDot Codes is less than its counterpart codes in [5, 26]. The results in Fig. 11 also show that, for the same system, as the partitioning factor $m_{1}$ decreases (i.e., as the redundancy in worker nodes increases), the stability of the Generalized OrthoMatDot code construction decreases; however, it is still better than the monomial-basis based codes in any cases.

VIII Numerically Stable Lagrange Coded Computing

In this section, we study the numerical stability of Lagrange coded computing [12] that lifts coded computing beyond matrix-vector and matrix-matrix multiplications, to multi-variate polynomial computations. As shown in [12], Lagrange coded computing has applications in gradient coding, privacy and secrecy. Our main contribution here is to develop a numerically stable approach towards Lagrange coded computing inspired by our result of Theorem V.1. In particular, our contribution involves (a) careful choice of evaluation points, and (b) a careful decoding algorithm that involves inversion of the appropriate Chebyshev Vandermonde matrix. We describe the system model in Section VIII-A. We overview the Lagrange coded computing technique of [12] in Section VIII-B. We describe our numerically stable approach in Section VIII-C, and present the results of our numerical experiments in Section VIII-D.

VIII-A System Model and Problem Formulation

We consider, for this section, the distributed computing framework depicted in Fig. 12, that is used in [12] and consists of a master node, $P$ worker nodes, and a fusion node where the only communication allowed is from the master node to the different worker nodes and from the worker nodes to the fusion node. The worker nodes have a prior knowledge of a polynomial function of interest $f:\mathbb{R}^{d}\rightarrow\mathbb{R}^{v}$ of degree $\operatorname{deg}(f)$ , where $d,v\in\mathbb{N}^{+}$ . In addition, the master node possesses a set of data points $\mathcal{X}=\{X_{1},\cdots,X_{m}\}$ , where $X_{i}\in\mathbb{R}^{d}$ , $i\in[m]$ . For every worker node $i\in[P]$ , the master node is allowed to send some encoded vector $\tilde{X}_{i}(X_{1},\cdots,X_{m})\in\mathbb{R}^{d}$ . Once a worker node receives the encoded vector on its input, it evaluates $f$ at this encoded vector and sends the evaluation to the fusion node. That is, for $i\in[P]$ , worker node $i$ receives $\tilde{X}_{i}$ on its input, evaluates $f(\tilde{X}_{i})$ , then it sends the result to the fusion node. Finally, the fusion node is expected to numerically stably decode the set of evaluations $\mathcal{F}=\{f(X_{1}),\cdots,f(X_{m})\}$ after it receives the output of any $K$ worker nodes.

VIII-B Background on Lagrange Coded Computing

In this section, we review the baseline Lagrange coded computing method introduced in [12] considering the framework in Section VIII-A. Notice that although the method in [12] is more general, here, for simplicity, we limit our discussion to the systematic Lagrange coded computing. That is, we assume that for $i\in[m]$ , worker node $i$ receives the $i$ -th data point from the master node. In other words, we assume that $\tilde{X}_{i}=X_{i},i\in[m]$ . Now, the encoding procedure goes as follows: First, let $x_{1},\cdots,x_{P}$ be distinct real values, an encoding function $g(x)$ is defined as:

[TABLE]

Given this encoding function, the master node sends the encoded vector $\tilde{X}_{i}=g(x_{i})$ to the worker node $i$ , for every $i\in[P]$ . Notice that the encoding function $g(x)$ indeed leads to a systematic encoding since $\tilde{X}_{i}=g(x_{i})=X_{i},$ for all $i\in[m]$ . Every worker node $i$ computes $f(\tilde{X}_{i})$ upon the reception of $\tilde{X}_{i}$ , and sends the result to the fusion node. The fusion node waits till receiving the output of any $K:=(m-1)\deg(f)+1$ . Since $f(g(x))$ has degree $(m-1)\deg(f)$ in $x$ , the fusion node is able to interpolate $f(g(x))$ after receiving the outputs of any $(m-1)\deg(f)+1$ , i.e., $K$ , worker nodes. Since $g(x_{i})=X_{i},i\in[m]$ , the fusion nodes evaluates $\{f(g(x_{1})),\cdots,f(g(x_{m}))\}$ to obtain $\{f(X_{1}),\cdots,f(X_{m})\}.$

VIII-C Numerically Stable Lagrange Coded Computing

Lagrange coded computing requires performing an interpolation at the fusion node to recover the polynomial $f(g(x))$ . Performing the interpolation by obtaining the coefficients of the polynomial in a monomial basis requires inverting a square Vandermonde matrix which is numerically unstable. Noting that the first $\ell$ Cheybshev polynomials also forms a basis for degree $\ell-1$ polynomials, we provide an alternative decoding procedure whose key idea is to find the coefficients of polynomial $f(g(x))$ in the basis of Chebyshev polynomials. Thereby, our decoding procedure involves inverting the Chebyshev-Vandermonde matrix444Since both systematic and non-systematic Lagrange coded computing require the inversion of the same Chebyshev-Vandermonde matrix, our numerically stable decoding procedure in Construction 5 naturally extends to non-systematic Lagrange coded computing, with the only difference is in the last step of evaluating $f(g(x))$ at $x_{1},\cdots,x_{m}$ , where in the non-systematic case, $f(g(x))$ is instead evaluated at some predefined values $y_{1},\cdots,y_{m}$ such that $g(y_{i})=X_{i}$ for all $i\in[m].$ . Guided by Theorem V.1, we choose the evaluation points to be the $P$ -point Chebyshev grid $\bm{\rho}^{(P)}$ to obtain a decoding procedure that is more stable than one that uses the monomial basis.

Our numerically stable algorithm for Lagrange coded computing is formally described in Construction 5.

In the following, we explain the notation used in Construction 5. We let the polynomial at the $i$ -th entry of $f(g(x))$ be denoted $f^{(i)}{(x)}$ and written as $f^{(i)}(x)=\sum_{l=0}^{K-1}c_{l}^{(i)}T_{l}(x)$ . Following the notation in Section III-B, we use the Chebyshev-Vandermonde matrices $\mathbf{G}^{(K,P)}(\bm{\rho}^{(P)})$ , and $\mathbf{G}^{(K,P)}_{\mathcal{R}}(\bm{\rho}^{(P)})$ , for any subset $\mathcal{R}=\{{r_{1}},\cdots,{r_{K}}\}\subset[P]$ , we also define the matrix $\mathbf{G}^{(K,P)}_{[m]}(\bm{\rho}^{(P)})$ . Finally, we assume that our construction returns as output the set of evaluations $\hat{\mathcal{F}}=$ $\{\hat{f}(X_{1}),$ $\cdots,\hat{f}(X_{m})\}$ , where for each $\hat{f}(X_{i}),i\in[m]$ , we have $\hat{f}(X_{i})=(\hat{f}^{(1)}(x_{i}),\cdots,\hat{f}^{(v)}(x_{i}))$ , where for every $i\in[m],j\in[v],\hat{f}^{(j)}(x_{i})$ and ${f}^{(j)}(x_{i})$ would be the same if the machine had infinite precision.

In the following, we show through numerical experiments the stability of our proposed Construction 5.

VIII-D Numerical Results

In our experiments, we assume that we have a distributed system of $P$ worker nodes, $m=P-2$ data points/input vectors $X_{1},\cdots,X_{m}$ , each of them is of dimension $d=10$ , where each entry of every input vector is picked independently, according to the standard Gaussian distribution $\mathcal{N}(0,1)$ . The function of interest in this system is $f(X)=Y^{T}X$ , where $Y$ is some $d$ -dimensional vector with entries picked independently according to the standard Gaussian distribution $\mathcal{N}(0,1)$ . In our experiments, we compare between Construction 5, where the Chebyshev basis is used for interpolation, and the case where the monomial basis is used for interpolation instead. Let $\hat{\mathbf{f}}=(\hat{f}(X_{1})\cdots\hat{f}(X_{m}))$ be the system’s output vector, and ${\mathbf{f}}=({f}(X_{1})\cdots{f}(X_{m}))$ be the correct output vector, we define the relative error between $\mathbf{f}$ and $\hat{\mathbf{f}}$ to be

[TABLE]

The results, shown in Fig. 13, illustrates that using the Chebyshev basis for interpolation provides less relative error/higher stability than the monomial basis at every system size. Fig. 13 also shows that under a certain relative error constraint, Construction 5 provides higher scalability than the monomial basis case. Specifically, let us assume that a relative error up to $0.1$ can be tolerated, Fig. 13 shows that the monomial-basis interpolation construction can support systems with a number of worker nodes only less than $40$ . However, for the same relative error constraint, Construction 5 can support systems with a number of worker nodes up to $100$ .

IX Concluding Remarks

In this paper, we develop numerically stable codes for matrix-matrix multiplication and Lagrange coded computing. A distinctive character of our work is the infusion of principles of numerical approximation theory into coded computing towards the end goal of numerical stability. In particular, our work is marked by the use of orthogonal polynomials for encoding, Gauss quadrature techniques for decoding and new bounds on the condition number of Chebyshev Vandermonde matrices. Notably, our constructions obtain the same recovery threshold as MatDot Codes and Polynomial Codes for matrix multiplication as well as for Lagrange Coded Computing. However, our construction in Section VII obtains a weaker (higher) recovery threshold than previous constructions [26, 5] for the problem of coded matrix multiplication when the computation/communication cost is constrained to be lower than that of MatDot Codes. The search of numerically stable codes for this application with the same recovery threshold as [26, 5] remains open.

While our paper focuses on applications where polynomial based encoding are particularly useful, our results might be useful for other applications as well. For instance, for the simple matrix-multiplication problem $\mathbf{A}\mathbf{x}$ performed in a distributed setting over $P$ worker nodes, where the goal is to encode $\mathbf{A}$ such that each worker stores a partition $1/m$ of matrix $\mathbf{A},$ it is well known that MDS type codes can be used [13, 27]. Specifically, let $\mathbf{A}=\begin{bmatrix}\mathbf{A}_{1}\\ \mathbf{A}_{2}\\ \vdots\\ \mathbf{A}_{m}\end{bmatrix}$ and let $\mathbf{H}=(h_{ij})$ be an $m\times P$ matrix where every $m\times m$ submatrix of $\mathbf{H}$ has a full rank of $m$ . Then the $p$ -th worker for $p\in\{1,2,\ldots,P\}$ can compute $\left(\sum_{i=1}^{m}{h}_{ip}\mathbf{A}_{i}\right)\mathbf{x};$ the product $\mathbf{A}\mathbf{x}$ can be recovered from any $m$ of the $P$ nodes. The instinctual, Reed-Solomon inspired solution of choosing $\mathbf{H}$ to be a Vandermode matrix is ill-conditioned over real numbers. Note however that, unlike the matrix multiplication problem, the matrix $\mathbf{H}$ does not need to have a polynomial structure. Indeed, choosing $\mathbf{H}$ to be a random Gaussian matrix leads to well-conditioned solutions with high probability. In particular, the following result follows from elementary arguments that build on [31].

Theorem IX.1

Let $\mathbf{H}$ be an $m\times P$ matrix, $P\geq m\geq 3$ , and let the entries of $\mathbf{H}$ be independent and identically distributed standard Gaussian random variables. Then,

[TABLE]

The theorem which is proved in Appendix D, formally demonstrates that for a fixed number of redundant workers $s=P-m,$ the worst case condition number grows as $O(mP^{2s})$ with high probability. However, the random Gaussian matrix approach has two drawbacks: (i) for a given realization of the random variables, it is difficult to verify whether it is well-conditioned, and (ii) the lack of structure could lead to more complex decoding. Our result of Theorem V.1 also indicates that choosing $\mathbf{H}=\mathbf{G}^{(m,P)}(\bm{\rho}^{(P)}),$ i.e., to be a Chebyshev Vandermonde matrix, naturally provides a well-conditioned solution to this problem. Another solution for the matrix-vector multiplication problem is provided in [25] via universally decodable matrices [32]; in this work numerical stability is demonstrated empirically.

It is, however, important to note that the problems resolved in our paper here are more restrictive since matrix multiplication codes - where both matrices are to be encoded so that the product can be recovered - require much more structure than matrix-multiplication where only one matrix is to be encoded. For instance, random Gaussian encoding does not naturally work for matrix multiplication to get a recovery threshold of $2m-1$ , and it is not clear whether the solution of [25] is applicable either. The utility of Chebyshev-Vandermonde matrices for a variety of coded computing problems including matrix-vector multiplication, matrix multiplication and Lagrange coded computing motivates the study of low-complexity decoding and error correction mechanisms for these systems.

Appendix A Proof of Claim IV.2

We have,

[TABLE]

In addition, noting that $p_{\mathbf{A}}(x)p_{\mathbf{B}}(x)$ (i.e., $p_{\mathbf{C}}(x)$ ) is of degree $2m-2$ (less than $2m$ ), Theorem III.2 implies that

[TABLE]

Finally, combining (A) and (A) completes the proof. $\Box$

Appendix B Proof of Theorem V.1

We use the following trigonometric identity in our proof.

Lemma B.1

For $n\geq 0$ , let $x_{i}$ be chosen as (10). Then $\prod_{j\neq i}(x_{i}-x_{j})=(-1)^{i-1}\frac{2^{1-n}n}{\sin(\frac{(2i-1)\pi}{2n})}$

Proof:

Note that $2^{n-1}\prod_{i=1}^{n}(x-x_{i})=T_{n}(x)=\cos(n\cos^{-1}(x))$ . Therefore,

[TABLE]

where $T_{n}^{\prime}(x)$ denotes the derivative of $T_{n}(x).$ Using $x_{i}=\cos(\frac{(2i-1)\pi}{2n})$ above we get the desired result. ∎

Proof:

We show that any square sub-matrix of $\mathbf{G}^{(n-s,n)}(\bm{\rho}^{(n)})$ formed by any $n-s$ columns of $\mathbf{G}^{(n-s,n)}(\bm{\rho}^{(n)})$ satisfies the bound stated in the theorem. Let $\mathcal{S}$ be a subset of $[n]$ such that $|\mathcal{S}|=s$ , for some $s<n$ , and define $\mathbf{G}^{(n-s,n)}_{[n]-\mathcal{S}}(\bm{\rho}^{(n)})$ to be the square $n-s\times n-s$ submatrix of $\mathbf{G}^{(n-s,n)}(\bm{\rho}^{(n)})$ after removing the columns with indices in $\mathcal{S}$ . Recalling the structure of $\mathbf{G}^{(n-s,n)}(\bm{\rho}^{(n)})$ from (32), we can write it as

[TABLE]

Moreover, for any $\mathcal{S}\subset[n]$ such that $|\mathcal{S}|=s$ , we can write

[TABLE]

where ${\Gamma}=(\gamma_{1},\gamma_{2},\cdots,\gamma_{n-s})=(\rho^{(n)}_{g_{1}},\rho^{(n)}_{g_{2}},\cdots,\rho^{(n)}_{g_{n-s}})$ , where $\{g_{i}\}_{i\in[n-s]}=[n]-\mathcal{S}$ and $g_{1}<g_{2}<\cdots<g_{n-s}$ . Now, notice that $||\mathbf{G}_{\Gamma}^{(n-s,n)}||^{2}_{F}=\sum_{i=1}^{n-s}\sum_{j=1}^{n-s}|T_{i-1}(\gamma_{j})|^{2}$ , and $|T_{i}(\gamma_{j})|\leq 1$ for any $i,j\in[n-s]$ . Therefore, we have

[TABLE]

In the following, we obtain an upper bound on $||(\mathbf{G}_{\Gamma}^{(n-s,n)})^{-1}||_{F}$ . Let $L_{\Gamma,k}$ be the $k$ -th Lagrange polynomial associated with $\Gamma$ , that is,

[TABLE]

Since $L_{\Gamma,k}(x)$ has a degree of $n-s-1$ , it can be written in terms of the Chebyshev basis $T_{0}(x),\cdots,T_{n-s-1}(x)$ as

[TABLE]

for some real coefficients $a_{0,k},\cdots,a_{n-s-1,k}$ . Now, from (146), note the following property regarding $L_{\Gamma,k}(x)$ :

[TABLE]

Using this property and observing (147), we conclude that, for any $j\in[n-s]$ , $\sum_{i=0}^{n-s-1}a_{i,k}T_{i}(\gamma_{j})=\delta(k-j)$ . Therefore,

[TABLE]

where $\mathbf{I}_{n-s}$ is the $n-s\times n-s$ identity matrix. That is,

[TABLE]

Therefore,

[TABLE]

In addition, we have that

[TABLE]

From (159) and (B), we conclude that $||(\mathbf{G}_{\Gamma}^{(n-s,n)})^{-1}||_{F}^{2}=\sum_{k=1}^{n-s}\int_{-1}^{1}L^{2}_{\Gamma,k}(x)w(x)dx$ .

Now, we express the integral $\int_{-1}^{1}L^{2}_{\Gamma,k}(x)w(x)dx$ in the Gauss quadrature form using the $n$ roots of $T_{n}(x):$ $\rho_{1}^{(n)},\cdots,\rho_{n}^{(n)}$ . Note that this is a “trick” we use in the proof - it is possible to use the Gauss quadrature formula over $n-s$ nodes to express the integral of the degree $2(n-s-1)$ polynomial $L^{2}_{\Gamma,k}(x)$ . However, the use of $n$ nodes instead of $n-s$ nodes leads to simple tractable bound for $||(\mathbf{G}_{\Gamma}^{(n-s,n)})^{-1}||^{2}_{F}.$ Now, we can write

[TABLE]

for some constants $c_{1},\cdots,c_{n}$ . Moreover, $c_{1},\cdots,c_{n}$ for the Chebyshev polynomials of the first kind are, in fact, all equal to $\pi/n$ . Therefore, we have

[TABLE]

and, consequently,

[TABLE]

Now, from (146), note that $L_{\Gamma,k}(x)$ has the following evaluations

[TABLE]

Therefore, (163) can be written as

[TABLE]

In order to obtain our upper bound on $||(\mathbf{G}_{\Gamma}^{(n-s,n)})^{-1}||_{F}^{2}$ , in the following, we get an upper bound on the term $\prod_{j\in[n-s]-\{k\}}\left(\frac{\rho_{i}^{(n)}-\gamma_{j}}{\gamma_{k}-\gamma_{j}}\right)^{2}$ in (B). Notice that $\prod_{j\in[n-s]-\{k\}}\left(\frac{\rho_{i}^{(n)}-\gamma_{j}}{\gamma_{k}-\gamma_{j}}\right)^{2}$ can be written as

[TABLE]

where the last equality follows from Lemma B.1. Moreover, the product $\prod_{j\in[n-s]-\{k\}}\left(\rho_{g_{k}}^{(n)}-\rho^{(n)}_{g_{j}}\right)^{2}$ in (B) can be written as

[TABLE]

where the last equality follows from Lemma B.1. Now, substituting from (B) in (B) yields

[TABLE]

Using (B) in (B), we conclude that

[TABLE]

Finally, combining (145) and (172), we conclude that

[TABLE]

∎

Appendix C Proof of Claim VII.2

Let $\alpha=2m_{2}-1,\gamma=\alpha(2m_{1}-1)$ . $p_{\mathbf{A}}(x)$ in (VII-C) can be written as

[TABLE]

Similarly, $p_{\mathbf{B}}(x)$ in (VII-C) can be written as

[TABLE]

Now, the product $p_{\mathbf{A}}(x)p_{\mathbf{B}}(x)$ can be written as

[TABLE]

where,

[TABLE]

and,

[TABLE]

Now, in order to prove the claim, it suffices to prove the following two statements:

For any $i\in\{0,\cdots,m_{1}-1\},l\in\{0,\cdots,m_{3}-1\}$ , $\mathbf{C}_{i,l}$ is the matrix coefficient of $T_{m_{2}-1+i\alpha+l\gamma}$ in $p_{1}(x)$ . 2. 2.

For any $i\in\{0,\cdots,m_{1}-1\},l\in\{0,\cdots,m_{3}-1\}$ , the matrix coefficient of $T_{m_{2}-1+i\alpha+l\gamma}$ in $p_{2}(x)$ is $\mathbf{0}_{{N_{1}}/{m_{1}}\times{N_{3}}/{m_{3}}}$ , where $\mathbf{0}_{{N_{1}}/{m_{1}}\times{N_{3}}/{m_{3}}}$ is the ${N_{1}}/{m_{1}}\times{N_{3}}/{m_{3}}$ all zeros matrix.

In the following, we prove that statement 1) is true. In order to find the coefficient of $T_{m_{2}-1+i\alpha+l\gamma}$ in $p_{1}(x)$ , we find the set $\mathcal{S}_{1}=\{(i^{\prime},j^{\prime},k^{\prime},l^{\prime}):m_{2}-1-j^{\prime}+i^{\prime}\alpha+k^{\prime}+l^{\prime}\gamma=m_{2}-1+i\alpha+l\gamma\}$ . Rewriting $m_{2}-1-j^{\prime}+i^{\prime}\alpha+k^{\prime}+l^{\prime}\gamma=m_{2}-1+i\alpha+l\gamma$ , we have

[TABLE]

(177) implies that $l^{\prime}=l$ . Suppose $l^{\prime}\neq l$ , this means that $(k^{\prime}-j^{\prime})+(i^{\prime}-i)\alpha=c\gamma$ for some integer $c$ . However, this is a contradiction since $|(k^{\prime}-j^{\prime})+(i^{\prime}-i)\alpha|<\gamma$ , for any $i,i^{\prime},j^{\prime},k^{\prime}$ . Now, (177) can be written as

[TABLE]

Again, (178) implies $i^{\prime}=i$ . Suppose $i^{\prime}\neq i$ , this means $k^{\prime}-j^{\prime}=c\alpha$ , for some integer $c$ . However, this is a contradiction since $|k^{\prime}-j^{\prime}|<\alpha$ . Now, since $i^{\prime}=i$ , (178) implies $j^{\prime}=k^{\prime}$ . Thus, $\mathcal{S}_{1}=\{(i,j^{\prime},j^{\prime},k):j^{\prime}\in\{0,\cdots,m_{2}-1\}\}$ . That is, for any $i\in\{0,\cdots,m_{1}-1\},j\in\{0,\cdots,m_{3}-1\}$ , the matrix coefficient of $T_{m_{2}-1+i\alpha+l\gamma}$ in $p_{1}(x)$ is $\sum_{j^{\prime}=0}^{m_{2}-1}\mathbf{A}_{i,j^{\prime}}\mathbf{B}_{j^{\prime},l}=\mathbf{C}_{i,l}$ .

Now, it remains to prove statement 2). That is, for any $i\in\{0,\cdots,m_{1}-1\},l\in\{0,\cdots,m_{3}-1\}$ , the matrix coefficient of $T_{m_{2}-1+i\alpha+l\gamma}$ in $p_{2}(x)$ is $\mathbf{0}_{{N_{1}}/{m_{1}}\times{N_{3}}/{m_{3}}}$ . In order to find the coefficient of $T_{m_{2}-1+i\alpha+l\gamma}$ in $p_{2}(x)$ , we find the sets $\mathcal{S}_{2}^{(1)}=\{(i^{\prime},j^{\prime},k^{\prime},l^{\prime}):m_{2}-1-j^{\prime}+i^{\prime}\alpha-k^{\prime}-l^{\prime}\gamma=m_{2}-1+i\alpha+l\gamma\}$ , and $\mathcal{S}_{2}^{(2)}=\{(i^{\prime},j^{\prime},k^{\prime},l^{\prime}):-m_{2}+1+j^{\prime}-i^{\prime}\alpha+k^{\prime}+l^{\prime}\gamma=m_{2}-1+i\alpha+l\gamma\}$ .

First, for the set $\mathcal{S}_{2}^{(1)}$ , rewriting $m_{2}-1-j^{\prime}+i^{\prime}\alpha-k^{\prime}-l^{\prime}\gamma=m_{2}-1+i\alpha+l\gamma$ , we get

[TABLE]

From (179), we conclude that $l+l^{\prime}=0$ . Otherwise, $(-j^{\prime}-k^{\prime})+(i^{\prime}-i)\alpha=c\gamma$ , for some integer $c$ , a contradiction since $|(-j^{\prime}-k^{\prime})+(i^{\prime}-i)\alpha|<\gamma$ . Since $l+l^{\prime}=0$ and both $l,l^{\prime}$ are non-negative, we conclude that $l^{\prime}=l=0$ . Moreover, now (179) reduces to

[TABLE]

Again, since $|-j^{\prime}-k^{\prime}|<\alpha$ , we conclude that $i^{\prime}=i$ , which implies that $j^{\prime}+k^{\prime}=0$ . Since $j^{\prime}+k^{\prime}=0$ and both $j^{\prime},k^{\prime}$ are non-negative, we conclude that $j^{\prime}=k^{\prime}=0$ . Thus, $\mathcal{S}_{2}^{(1)}=\{(i,0,0,0)\}$ . Now, noticing from (C) that $\mathbf{A}_{i,0}\mathbf{B}_{0,0}$ does not contribute to any term in $p_{2}(x)$ , we conclude that the matrix coefficient of $T_{m_{2}-1+i\alpha+l\gamma}$ in $p_{2}(x)$ is only due to the set $\mathcal{S}_{2}^{(2)}$ . Recall that $\mathcal{S}_{2}^{(2)}=\{(i^{\prime},j^{\prime},k^{\prime},l^{\prime}):-m_{2}+1+j^{\prime}-i^{\prime}\alpha+k^{\prime}+l^{\prime}\gamma=m_{2}-1+i\alpha+l\gamma\}$ , we rewrite $-m_{2}+1+j^{\prime}-i^{\prime}\alpha+k^{\prime}+l^{\prime}\gamma=m_{2}-1+i\alpha+l\gamma$ as

[TABLE]

From (181), we conclude that $l=l^{\prime}$ . Otherwise, $(j^{\prime}+k^{\prime}-2m_{2}+2)-(i^{\prime}+i)\alpha=c\gamma$ , for some integer $c$ , a contradiction since $|(j^{\prime}+k^{\prime}-2m_{2}+2)-(i^{\prime}+i)\alpha|<\gamma$ . Moreover, now (181) reduces to

[TABLE]

Again, since $|j^{\prime}+k^{\prime}-2m_{2}+2|<\alpha$ , we conclude that $i^{\prime}+i=0$ . Since $i^{\prime}+i=0$ and both $i,i^{\prime}$ are non-negative, we conclude that $i^{\prime}=i=0$ , which implies that $j^{\prime}+k^{\prime}=2m_{2}-2$ . Since $j^{\prime}+k^{\prime}=2m_{2}-2$ and both $j^{\prime},k^{\prime}\leq m_{2}-1$ , we conclude that $j^{\prime}=k^{\prime}=m_{2}-1$ . Thus, $\mathcal{S}_{2}^{(2)}=\{(0,m_{2}-1,m_{2}-1,l)\}$ . Now, noticing from (C) that $\mathbf{A}_{0,m_{2}-1}\mathbf{B}_{m_{2}-1,l}$ does not contribute to any term in $p_{2}(x)$ , we conclude that the matrix coefficient of $T_{m_{2}-1+i\alpha+l\gamma}$ in $p_{2}(x)$ is $\mathbf{0}_{{N_{1}}/{m_{1}}\times{N_{3}}/{m_{3}}}$ . $\Box$

Appendix D Upper Bound on the Condition Number of Gaussian Matrices

We first introduce the following theorem from [31].

Theorem D.1

Let $\mathbf{A}$ be an $m\times m$ matrix, $m\geq 3$ , and let the entries of $\mathbf{A}$ be independent and identically distributed standard Gaussian random variables. Then, for all $\alpha>1$ ,

[TABLE]

where $\kappa_{2}(\mathbf{A})$ is the condition number of $\mathbf{A}$ with respect to the matrix norm induced by $\ell_{2}$ .

As a consequence, in the following, we extend the result in Theorem D.1 to bound the condition number of every $m\times m$ sub-matrix of a random $m\times P$ matrix with $i.i.d$ standard Gaussian entries, $P\geq m$ .

Proof:

For any subset $\mathcal{S}\subseteq\{1,2,\ldots,P\}$ , let $\mathbf{H}_{\mathcal{S}}$ denote the $|\mathcal{S}|\times m$ sub-matrix of $\mathbf{H}$ containing the columns $\mathbf{H}$ corresponding to $\mathcal{S},$ and let $s=P-m$ . Then we have

[TABLE]

where $(1)$ follows from the union bound, and $(2)$ follows from the fact that $\left(\begin{array}[]{cc}P\\ s\end{array}\right)\leq P^{s}$ and Theorem D.1. ∎

Bibliography32

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] S. Dutta, V. Cadambe, and P. Grover, “Short-Dot: Computing Large Linear Transforms Distributedly Using Coded Short Dot Products,” in Advances In Neural Information Processing Systems (NIPS) , 2016, pp. 2092–2100.
2[2] Q. Yu, M. A. Maddah-Ali, and A. S. Avestimehr, “Polynomial Codes: an Optimal Design for High-Dimensional Coded Matrix Multiplication,” in Advances In Neural Information Processing Systems (NIPS) , 2017, pp. 4403–4413.
3[3] M. Fahim, H. Jeong, F. Haddadpour, S. Dutta, V. Cadambe, and P. Grover, “On the optimal recovery threshold of coded matrix multiplication,” in Communication, Control, and Computing (Allerton) , Oct 2017, pp. 1264–1270.
4[4] S. Dutta, M. Fahim, F. Haddadpour, H. Jeong, V. R. Cadambe, and P. Grover, “On the optimal recovery threshold of coded matrix multiplication,” Co RR , vol. abs/1801.10292, 2018, Accepted to appear in IEEE Transactions on Information Theory .
5[5] S. Dutta, Z. Bai, H. Jeong, T. M. Low, and P. Grover, “A unified coded deep neural network training strategy based on generalized polydot codes,” in 2018 IEEE International Symposium on Information Theory (ISIT) , June 2018, pp. 1585–1589, http://arxiv.org/abs/1811.10 751.
6[6] R. Tandon, Q. Lei, A. G. Dimakis, and N. Karampatziakis, “Gradient coding,” in Machine Learning Systems Workshop, Advances in Neural Information Processing Systems (NIPS) , 2016.
7[7] ——, “Gradient coding: Avoiding stragglers in distributed learning,” in International Conference on Machine Learning , 2017, pp. 3368–3376.
8[8] M. Ye and E. Abbe, “Communication-computation efficient gradient coding,” in Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018 , 2018, pp. 5606–5615. [Online]. Available: http://proceedings.mlr.press/v 80/ye 18a.html

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Code & Models

Videos

Numerically Stable Polynomially Coded Computing

Abstract

I Introduction

II Summary of Contributions

III Preliminaries on Numerical Analysis and Notations

III-A Preliminaries on Numerical Analysis

Definition III.1** (Condition Number)**

Definition III.2** (Inner Products on C[a,b]C[a,b]C[a,b])**

Definition III.3** (Orthogonal Polynomials)**

Definition III.4** (Orthonormal Polynomials)**

Example III.1** (Chebyshev polynomials of the first kind)**

Theorem III.1

Theorem III.2** (Gauss Quadrature)**

Remark III.1

III-B Notations

IV OrthoMatDot: Orthonormal Polynomials based Codes for Distributed Matrix Multiplication

IV-A System Model and Problem Formulation

IV-A1 System Model

IV-A2 Problem Formulation

IV-B OrthoMatDot Code Construction

Theorem IV.1

Claim IV.2

Proof:

Remark IV.1

IV-B1 Complexity Analyses of OrthoMatDot

Remark IV.2

V Numerically Stable Codes for Matrix Multiplication via OrthoMatDot Codes with Chebyshev Polynomials

V-A Chebyshev Polynomials based OrthoMatDot Code Construction

V-A1 Complexity Analyses:

V-B Evaluation Points and Condition Number Bound

Theorem V.1

Remark V.1

V-C Numerical Results

Remark V.2

VI OrthoPoly: Low Communication/Computation Numerically Stable Codes for Distributed Marix Multiplication

VI-A Problem Formulation

VI-B Example (m=n=3)(m=n=3)(m=n=3)

VI-C OrthoPoly Code Construction

VI-C1 Complexity Analyses of OrthoPoly

Remark VI.1

VI-D Numerical Results

VII Generalized OrthoMatDot: Numerically stable Codes for Matrix Multiplication with Communication/Computation-Recovery Threshold Trade-off

VII-A System Model and Problem Formulation

Remark VII.1

VII-B Example (m1=m2=m3=2)(m_{1}=m_{2}=m_{3}=2)(m1​=m2​=m3​=2)

VII-C Generalized OrthoMatDot Code Construction

Theorem VII.1

Remark VII.2** (Notation)**

Claim VII.2

Proof:

VII-C1 Complexity Analyses of Generalized OrthoMatDot

Remark VII.3

VII-D Numerical Results

VIII Numerically Stable Lagrange Coded Computing

VIII-A System Model and Problem Formulation

VIII-B Background on Lagrange Coded Computing

VIII-C Numerically Stable Lagrange Coded Computing

VIII-D Numerical Results

IX Concluding Remarks

Theorem IX.1

Appendix A Proof of Claim IV.2

Appendix B Proof of Theorem V.1

Lemma B.1

Proof:

Proof:

Appendix C Proof of Claim VII.2

Appendix D Upper Bound on the Condition Number of Gaussian Matrices

Theorem D.1

Proof:

Definition III.1 (Condition Number)

Definition III.2 (Inner Products on $C[a,b]$ )

Definition III.3 (Orthogonal Polynomials)

Definition III.4 (Orthonormal Polynomials)

Example III.1 (Chebyshev polynomials of the first kind)

Theorem III.2 (Gauss Quadrature)

VI-B Example $(m=n=3)$

VII-B Example $(m_{1}=m_{2}=m_{3}=2)$

Remark VII.2 (Notation)