TL;DR
This paper introduces orthogonal polynomial-based codes for matrix multiplication that enhance numerical stability and fault tolerance, with theoretical bounds and empirical validation showing reduced errors compared to traditional Vandermonde-based methods.
Contribution
It develops new orthogonal polynomial codes, especially using Chebyshev polynomials, with proven bounds on condition numbers, leading to more numerically stable coded computing techniques.
Findings
Orthogonal polynomial codes achieve similar fault tolerance as previous codes.
Chebyshev-Vandermonde matrices have polynomially bounded condition numbers.
Empirical results show significantly lower numerical errors with the new methods.
Abstract
We study the numerical stability of polynomial based encoding methods, which has emerged to be a powerful class of techniques for providing straggler and fault tolerance in the area of coded computing. Our contributions are as follows: 1) We construct new codes for matrix multiplication that achieve the same fault/straggler tolerance as the previously constructed MatDot Codes and Polynomial Codes. Unlike previous codes that use polynomials expanded in a monomial basis, our codes uses a basis of orthogonal polynomials. 2) We show that the condition number of every sub-matrix of an Chebyshev-Vandermonde matrix, evaluated on the -point Chebyshev grid, grows as for . An implication of this result is that, when Chebyshev-Vandermonde matrices are used for coded computing, for a fixed number of redundant nodes the condition…
| Number | MatDot | OrthoMatDot | MatDot | OrthoMatDot |
|---|---|---|---|---|
| of Workers | worst case | worst case | average | average |
| relative error | relative error | relative error | relative error | |
| 30 | ||||
| 50 | ||||
| 80 | ||||
| 150 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Numerically Stable Polynomially Coded Computing
Mohammad Fahim and Viveck R. Cadambe
M. Fahim and V. Cadambe are with the Department of Electrical Engineering, Pennsylvania State University, University Park, PA 16802.This work will be presented in part at the IEEE International Symposium on Information Theory (ISIT), July 2019.
Abstract
We study the numerical stability of polynomial based encoding methods, which has emerged to be a powerful class of techniques for providing straggler and fault tolerance in the area of coded computing. Our contributions are as follows:
We construct new codes for matrix multiplication that achieve the same fault/straggler tolerance as the previously constructed MatDot Codes and Polynomial Codes. Unlike previous codes that use polynomials expanded in a monomial basis, our codes use a basis of orthogonal polynomials. 2. 2.
We show that the condition number of every sub-matrix of an Chebyshev-Vandermonde matrix, evaluated on the -point Chebyshev grid, grows as for . An implication of this result is that, when Chebyshev-Vandermonde matrices are used for coded computing, for a fixed number of redundant nodes the condition number grows at most polynomially in the number of nodes . 3. 3.
By specializing our orthogonal polynomial based constructions to Chebyshev polynomials, and using our condition number bound for Chebyshev-Vandermonde matrices, we construct new numerically stable techniques for coded matrix multiplication. We empirically demonstrate that our constructions have significantly lower numerical errors compared to previous approaches which involve inversion of Vandermonde matrices. We generalize our constructions to explore the trade-off between computation/communication and fault-tolerance. 4. 4.
We propose a numerically stable specialization of Lagrange coded computing. Motivated by our condition number bound, our approach involves the choice of evaluation points and a suitable decoding procedure that involves inversion of an appropriate Chebyshev-Vandermonde matrix. Our approach is demonstrated empirically to have lower numerical errors as compared to standard methods.
I Introduction
The recently emerging area of “coded computing” focuses on incorporating redundancy based on coding-theory-inspired strategies to tackle central challenges in distributed computing, including stragglers, failures, processing errors, communication bottlenecks and security issues. Such ideas have been applied to different large scale distributed computations such as matrix multiplication [1, 2, 3, 4, 5], gradient methods [6, 7, 8], linear solvers [9, 10, 11] and multi-variate polynomial evaluation [12]. An important idea that has emerged from this body of the work is the use of novel, Reed-Solomon like polynomial based methods for encoding data. In polynomial based methods, each computation node stores a linearly encoded combination of the data partitions, where data stored at different worker nodes can be interpreted as evaluation of an appropriate polynomial at different points. The nodes then perform computation on these encoded versions of the data, and a central master/fusion node aggregates the outputs of these computations to recover the overall result via a decoding process that inevitably involves polynomial interpolation. Much like Reed Solomon Codes, if the number of nodes performing the computation is higher than the number of evaluation points required for accurate interpolation, the overall computation is tolerant to faults and stragglers.
Perhaps the most striking application of polynomial based methods comes in the context of matrix multiplication. To multiply two matrices assuming that each node stores of each matrix, classical work in algorithm based fault tolerance [13] outlines a coding based method which has been analyzed in [14]. Reference [2] showed through polynomial based encoding methods that the result of just nodes can be used by the master node to recover the matrix-product. Remarkably, this means that polynomial based codes ensure that the recovery threshold - the worst case number of nodes whose computation suffices to recover the overall matrix-product - does not grow with , the number of the distributed system’s worker nodes, unlike the approaches of [13, 14]. The recovery threshold for matrix multiplication has been improved to via a code construction called MatDot Codes in [3], albeit at a higher communication/computation cost than codes in [2]. A second prominent application of polynomial based methods is the idea of Lagrange coded computing [12], where coding is applied for multi-variate polynomial computing with guarantees of straggler resilience, security and privacy. In addition, polynomial-based methods are also useful for communication-efficient approaches for inverse problems and gradient methods [8, 15, 10].
Despite the enormous success, the scalability of polynomial based methods in practice are limited by an “inconvenient truth”, their numerical instability. The decoding methods for polynomial based methods require interpolating a degree polynomial using evaluation points. While this is numerically stable for classical error correcting codes for communication and storage which are implemented over finite fields, we are concerned here for data processing applications where the operations are typically real-valued. The main reason for the instability is that either implicitly or explicitly, interpolation effectively solves a linear system whose transform is characterized by a Vandermonde matrix. It is well known that the condition number of Vandermonde matrices with real-valued nodes grows exponentially in the dimension of the matrix [16, 17, 18, 19]. The large condition number means that small perturbations of the Vandermonde matrix due to numerical precision errors can result in singular matrices [20, 21]. In practice, this can translate to large numerical errors even when the coded computation is distributed among few tens of nodes111For example, [22], reports that “In our experiments we observed large floating point errors when inverting high degree Vandermonde matrices for polynomial interpolation”.. Conventional intuition dictates that the main scalability bottlenecks in distributed computing include computation cost per worker, communication bottlenecks, and stragglers. However, for polynomially coded computing, it turns out that numerically stability is also critical and constitutes a huge bottleneck for scalability of such codes. Indeed, a polynomially coded computing scheme that achieves the minimum recovery threshold, and that is optimal computation/communication wise, will simply fail once implemented on a distributed system with tens of computing nodes due to the large numerical errors. Thus, the main contribution of our paper is a new numerically stable approach to polynomially coded computing.
II Summary of Contributions
In this paper, we develop a new, numerically stable, approach for polynomially coded computing. A significant difference from previous polynomial coding approaches is that we depart from the monomial basis, which allows us to circumvent the inherently ill-conditioned Vandermonde-matrices. We demonstrate our approach through two important applications of polynomially coded computing: matrix multiplication, and Lagrange coded computing.
To illustrate our results, consider the coded matrix multiplication problem, where the goal is to multiply two matrices over computation nodes where each node stores of each of the two matrices. A master node encodes into matrices each, and sends these matrices respectively to each worker node. Each worker node multiplies the received encoded matrices, and sends the product back to the fusion node222The master and fusion nodes are logical entities; in practice, they may be the same node, or may be emulated in a decentralized manner by the computation nodes., which aims to recover from a subset of the worker nodes. The recovery threshold is defined as a number such that the computation of any set of worker nodes suffices to recover the product The MatDot scheme of [3] achieves the best known recovery threshold of . We begin with an example of MatDot Codes for
Example 1: MatDot Codes [3], recovery threshold = 3: Consider two matrices
[TABLE]
where are matrices and are matrices. * Define and and let be distinct real values. Notice that is the coefficient of in polynomial . In MatDot Codes, as illustrated in Fig. 1, worker node computes so that from any of the nodes, the polynomial can be interpolated. Having interpolated the polynomial, the product is simply the coefficient of .*
A generalization of the above example leads to a recovery threshold of , with a decoding process that involves effectively inverting a Vandermonde matrix. It has been shown that the condition number of the Vandermonde matrix grows exponentially in with both and norms [16, 17]. The intuition behind the inherent poor conditioning of the monomial basis is demonstrated in Fig. 4 and Fig. 4.
Motivated by Fig.4, we aim, in this paper, to choose polynomials that are orthonormal. However, it is not immediately clear whether orthonormal polynomials are applicable for matrix multiplications. We demonstrate the applicability of orthonormal codes for matrix multiplication. For the example below, let denote two orthonormal polynomials such that
[TABLE]
where has degree .
Example 2 : OrthoMatDot Codes [This paper], recovery threshold = 3: For two matrices let and Notice that because of (1), we have
[TABLE]
This leads to the following coded computing scheme: worker node computes where are distinct real values, so that from any of the nodes, the fusion node can interpolate . Having interpolated the polynomial, the fusion node obtains the product by performing . This example is illustrated in Fig. 5.
A simple generalization of the above example, described in Construction 1 in Section IV, leads to a class of codes, we refer to it as OrthoMatDot Codes, with recovery threshold of , the same recovery threshold as MatDot Codes. In general, orthonormal polynomials are defined over arbitrary weight measure some well known classes of polynomials corresponding to different weight measures include Legendre, Chebyshev, Jacobi and Laguerre Polynomials [20, 21] (See Section III for definitions). Our OrthoMatDot Codes in Section IV can use any weight measure, and therefore can be used with different classes of orthonormal polynomials. Of particular interest to our paper are the Chebyshev polynomials (Fig. 4).
With our basic template, the task of developing numerically stable codes boils down to (A) interpolating in a numerically stable manner, and (B) integrating this polynomial in a numerically stable manner. For task (B), we use a decoding procedure via Gauss Quadrature [20, 23, 21] to recover the integral. Task (A) is particularly challenging in the coding setting, because our goal is to interpolate the coefficients of - expanded over a series of orthonormal polynomials - from any points among a set of points.
In Section V, we provide a specialization to the class of OrthoMatDot Codes, a numerically stable matrix multiplication code construction that has the same recovery threshold and communication/computation cost per worker as MatDot codes. The construction specializes the class of OrthoMatDot Codes via the use of Chebyshev polynomials, which are a class of orthogonal polynomials that are ubiquitous in numerical methods and approximation theory [21]. Construction 2 also specifies the choice of evaluation points
The decoding procedure outlined for the specialization of OrthoMatDot Codes in Section V involves the effective inversion of some sub-matrix of a Chebyshev-Vandermonde matrix [19], where each of the -th column contains evaluations of the first Chebyshev polynomials at . A key technical result of our paper shows that, with our choice of evaluation points every square sub-matrix of the Chebyshev-Vandermonde matrix is well-conditioned. More precisely, we show that, with our choice of , the condition number of any sub-matrix of the Chebyshev-Vandermonde matrix grows at most polynomially in when the number of redundant parity nodes is fixed. Our condition number bound may be viewed as result of independent interest in the area of numerical methods, and requires non-trivial use of techniques from numerical approximation theory. This result is in contrast with the well known exponential growth for Vandermonde systems. We also show the significant improvement in stability via numerical experiments in Section V-C. We also provide a preview of the results here in Table I, whose results demonstrate that remarkably, our Chebyhev-Vandermonde construction with even nodes has a smaller relative error than the Vandermonde-based MatDot Codes333We note that the numerical error depends not only on the condition number of the matrix, but also the algorithm used for solving the linear system. However, we are not aware of any approach that can accurately solve, say, a linear system with a Vandermonde matrix (See e.g., [24, 25]) with nodes.
While MatDot Codes [3] have an optimal recovery threshold of , they have relatively higher computation cost per worker () and worker node to fusion node communication cost () as compared to Polynomial Codes [2] which have a computation cost per worker of and worker node to fusion node communication cost of . In particular, each worker in MatDot Codes performs an “outer” product of an matrix with a matrix, whereas each worker in Polynomial Codes performs an “inner” product of a matrix with a matrix. The reduced computation/communication comes at the cost of weaker fault-tolerance - Polynomial Codes have a higher recovery threshold of as compared with MatDot Codes (). In Section VI, we develop numerically stable codes for matrix multiplication, again via orthogonal polynomials, that achieve the same low computation/communication costs as Polynomial Codes as well as the same recovery threshold, we refer to these codes as OrthoPoly Codes.
The trade-off between computation/communication cost and recovery threshold imposed by MatDot Codes and Polynomial Codes has motivated general code constructions that interpolates both of them [3, 5, 26], albeit using the monomial basis. In Section VII, we extend our approach to a general matrix multiplication code construction, referred to as Generalized OrthoMatDot, that offers a computation/communication cost vs recovery threshold trade-off, following the research thread for the monomial basis [3, 5, 26], however we also target numerical stability in our proposed construction. While our Generalized OrthoMatDot Codes specialize to OrthoMatDot Codes, i.e., they achieve the same optimal recovery threshold as OrthoMatDot Codes when allowing for the same computation/communication cost as OrthoMatDot Codes, they do not specialize to OrthoPoly Codes. Specifically, Generalized OrthoMatDot codes have higher recovery threshold than OrthoPoly Codes when allowing for the same computation/communication cost as OrthoPoly Codes. In Section VIII, we exploit the result obtained in Theorem V.1 on the condition number of the square sub-matrices of the Chebyshev-Vandermonde matrices to propose a numerically stable algorithm for Lagrange coded computing. In Section IX, we conclude with a discussion on other related problems such as matrix-vector multiplication [13, 27], and describe some related open questions.
III Preliminaries on Numerical Analysis and Notations
We discuss, in this section, the problem of finite precision in representing real numbers on digital machines and how it may horribly affect the output of computation problems performed on these machines. In addition, we also introduce some basic definitions and results from the area of numerical approximation theory that will be used in this paper[23], [28]. At the end of this section, we provide most of the common notations that will be used in this paper.
III-A Preliminaries on Numerical Analysis
Since digital machines have finite memory, real numbers are digitally stored using a finite number of bits, i.e., finite precision. However, storing real numbers using a finite number of bits leads to inevitable errors since a finite number of bits can only represent a finite number of real numbers with no errors. On the other hand, real numbers that cannot be directly represented using the specified finite number of bits have to be either truncated or rounded-off in order to fit in the memory. Although such perturbation (e.g., truncation/round-off error) of real numbers due to the finite precision of digital machines can be negligibly small, the perturbation of the output of any computation that uses such “small” perturbed stored real numbers as input does not necessarily be small as well. In fact, a very small perturbation to the input of some computation may lead to an output that is totally wrong and irrelevant to the correct output. The condition number of a computation problem captures/measures this observation.
Definition III.1** (Condition Number)**
Let be a function representing a computation problem with input , and let be a small perturbation of , and define to be the perturbation of due to , the condition number of the problem at with respect to some norm is
[TABLE]
Given the above definition of condition number, a problem is said to be “ill-conditioned” if small perturbations in the input lead to large perturbation in the output (i.e., the condition number is large). On the other hand, a problem is said to be “well-conditioned” if small perturbations in the input lead to small perturbations in the output (i.e., the condition number is small).
In what follows, we discuss the condition number of two computation problems: the matrix-vector multiplication and solving a system of linear equations. For both problems, consider the system of linear equations represented in the matrix form , where and non-singular, and , and let be some matrix norm. Then, let be fixed, the condition number of this matrix-vector multiplication problem with as its output given small perturbations in the input is , for any . Also, for the problem of solving the system of linear equations , with still fixed, the condition number of the problem of solving this system of linear equations, given small perturbations in the input , where is the output, is , for any .
Since we focus on polynomially coded computing, next, we introduce some basic tools of numerical approximation theory that will be used throughout this paper. Notice that, in the following, denotes the vector space of continuous integrable functions defined on the interval .
Definition III.2** (Inner Products on )**
For any , and given a non-negative integrable weight function ,
[TABLE]
defines an inner product on relative to .
Definition III.3** (Orthogonal Polynomials)**
Consider a non-negative integrable weight function , the polynomials in where has degree and
[TABLE]
for some non-zero values , where the inner product is relative to , are called orthogonal polynomials relative to , .
Definition III.4** (Orthonormal Polynomials)**
Consider a non-negative integrable weight function , the polynomials , where has degree , in such that
[TABLE]
where the inner product is relative to , are called orthonormal polynomials relative to .
Note that based on the above definitions, if the polynomials are orthogonal (or orthonormal), then is orthogonal to all polynomials of degree , i.e., , for any polynomial with degree strictly less than . It’s also worth noting that for , the orthogonal polynomials are Legendre polynomials, which are derived via Gram-Schmidt procedure applied to sequentially. In addition, the following is an important class of orthogonal polynomials in our paper.
Example III.1** (Chebyshev polynomials of the first kind)**
The following recurrence relation defines the Chebyshev polynomials of the first kind:
[TABLE]
where, . These Chebyshev polynomials are the corner stone of modern numerical approximation theory and practice with applications to numerical integration, and least-square approximations of continuous functions [23],[28]. are orthonormal relative to the weight function . In general, Chebyshev polynomials are defined over . However, for , , for any . For the rest of this paper, unless otherwise is stated, whenever Chebyshev polynomials are used, they are restricted only to the range .
We state, next, two results from [28] in Theorems III.1 and III.2.
Theorem III.1
Let be a weight function on the range , i.e., is a non-negative integrable function on , and let be distinct real numbers such that , there exist unique weights such that
[TABLE]
for all polynomials with degree less than .
Theorem III.1 is not surprising - the left hand side of the equation stated in the theorem is a linear operator on the vector space of -degree polynomials. Because of Lagrange-interpolation, the space of -degree polynomials is itself a linear transformation on its evaluation at points. Therefore, the left hand side can be expressed as an inner product of the functions evaluations at points. We next state a remarkable result by Gauss which states conditions under which the expression of Theorem III.1 is exact for polynomials of degree up to even though the number of evaluation points is just .
Theorem III.2** (Gauss Quadrature)**
Fix a weight function , and let be a set of orthonormal polynomials in relative to . Given , let be the roots of such that , and choose real values such that , for any with degree less than . Then, , for any polynomial with degree less than .
Remark III.1
Consider any orthonormal polynomials . For any , the set forms a basis for the vector space of polynomials with degree less than . 2. 2.
In Theorem III.2, can be chosen as
[TABLE] 3. 3.
In Theorem III.2, the roots of , i.e., are, in fact, real and distinct. Moreover, the Chebyshev polynomial of the first kind has the following roots
[TABLE]
The set is often called the -point Chebyshev grid, and its elements are called “Chebyshev nodes” of degree . We here discard the term “node” and use the term “Chebyshev points” to avoid confusion with computation nodes. We also denote by the vector . It is useful to note that can be written as
[TABLE]
and for , the weights in (9) are all equal to when
III-B Notations
Throughout this paper, we use lowercase bold letters to denote vectors and uppercase bold letters to denote matrices. In addition, for any positive integers , and given a set of orthogonal polynomials on the interval , let be a vector with entries in , we define the matrix as:
[TABLE]
For any subset , we denote by the sub-matrix of formed by concatenating columns with indices in , i.e.,
[TABLE]
For the special case where the orthogonal polynomials are the Chebyshev polynomials of the first kind , we define the matrix as:
[TABLE]
we denote by the sub-matrix of formed by concatenating columns with indices in , i.e.,
[TABLE]
Also, for the case where the orthogonal polynomials are the “orthonormal” Chebyshev polynomials , we define the matrix as:
[TABLE]
and we denote by the sub-matrix of formed by concatenating columns with indices in , i.e.,
[TABLE]
Wherever there is no ambiguity on , it may be dropped from the notation.
In the next section, we show that orthonormal polynomials can be used for designing codes for the distributed large scale matrix multiplication problem.
IV OrthoMatDot: Orthonormal Polynomials based Codes for Distributed Matrix Multiplication
In this section, we present a new orthonormal polynomials based class of codes for matrix-multiplication called OrthoMatDot. These codes achieve the same recovery threshold as MatDot Codes, and have similar computational complexity as MatDot. The main advantage of the proposed codes is that they avoid dealing with the ill-conditioned monomial basis used in previous work (e.g., in [3, 2, 5, 26]). In Section V, OrthoMatDot Codes will be specialized and demonstrated to have higher numerical stability as compared with state of the art. We begin with a formal problem formulation in Section IV-A, and describe our codes in Section IV-B.
IV-A System Model and Problem Formulation
IV-A1 System Model
We consider the distributed framework depicted in Fig. 6 that consists of a master node, worker nodes, and a fusion node where the only communication allowed is from the master node to the different worker nodes and from the worker nodes to the fusion node. It can happen that the fusion node and the master node be represented by the same node. In this case, the only communication allowed is the communication between the master node and every worker node.
IV-A2 Problem Formulation
The master node possesses two real-valued input matrices , with dimensions , , respectively. Every worker node receives from the master node an encoded matrix of of dimension and an encoded matrix of of dimension , and performs matrix multiplication of these two received inputs. Upon performing the matrix multiplication, each worker node sends the result to the fusion node. The fusion node needs to recover the matrix multiplication once it receives the results of any worker nodes, where . In this case, is denoted by the recovery threshold of the distributed computing scheme.
IV-B OrthoMatDot Code Construction
Our result regarding the existence of achievable codes solving the distributed matrix multiplication problem using orthonormal polynomials is stated in the following theorem.
Theorem IV.1
For the matrix multiplication problem described in Section IV-A2 computed on the system defined in Section IV-A1, a recovery threshold of is achievable using any set of orthonormal polynomials relative to some weight polynomial and defined on a range .
Before proving this theorem, we first present OrthoMatDot, a code construction that achieves the recovery threshold of given any set of orthonormal polynomials relative to a weight polynomial and defined on a range . In our code construction, we assume that matrix is split vertically into equal sub-matrices, of dimension each, and matrix is split horizontally into equal sub-matrices, of dimension each, as follows:
[TABLE]
we also define a set of distinct real numbers in the range , and define two encoding polynomials and and let .
In the following, we briefly describe the OrthoMatDot construction. First, for every , the master node sends to the -th worker node evaluations of at , that is, it sends and to the -th worker node. Next, for every , the -th worker node computes the matrix product and sends the result to the fusion node. Once the fusion node receives the output of any worker nodes, it interpolates the polynomial , and evaluates at , where are the roots of . Then, it performs the summation , where are as in (9).
We formally present OrthoMatDot code in Construction 1. Construction 1 uses the following notation. The output of the algorithm is the matrix The -th entries of the matrix polynomial and the matrix are respectively denoted as and The reader may also recall the definition of matrices and for any subset . is the vector of the roots of . Based on Construction 1, we state the following claim.
Claim IV.2
**
The proof of Claim IV.2 is provided in Appendix A.
Now, we can prove Theorem IV.1.
Proof:
In order to prove the theorem, it suffices to show that Construction 1 is a valid construction with a recovery threshold of . Therefore, in the following, we prove that Construction 1 can recover after the fusion node receives the output of at most worker nodes. Assume that the fusion node has already received the results of any worker nodes. Now, because the polynomial has degree , the evaluations of at any distinct points is sufficient to interpolate the polynomial, and since are distinct, the fusion node can interpolate once it receives the output of any worker nodes. Afterwards, given that (Claim IV.2), the fusion node can evaluate and perform the scaled summation to recover . ∎
Remark IV.1
*In Construction 1, setting to be the roots of leads to a faster decoding for the scenarios in which the first worker nodes send their results but only less than workers succeed to send their outputs. For such scenarios, we have , where the last equality follows from Claim IV.2. *
Next, we study the computational and communication costs of OrthoMatDot.
IV-B1 Complexity Analyses of OrthoMatDot
Encoding Complexity: Encoding for each worker requires performing two additions, each adding scaled matrices of size and , for an overall encoding complexity for each worker of . Therefore, the overall computational complexity of encoding for workers is .
Computational Cost per Worker: Each worker multiplies two matrices of dimensions and , requiring operations.
Decoding Complexity: Since has degree , the interpolation of requires the inversion of a matrix, with complexity , and performing matrix-vector multiplications, each of them is between the inverted matrix and a column vector of length of the received evaluations of the matrix polynomial at some position , with complexity . Next, the evaluation of the polynomial at requires a complexity of . Finally, performing the summation requires a complexity of . Thus, assuming that , the overall decoding complexity is .
Communication Cost: The master node sends symbols, and the fusion node receives symbols from the successful worker nodes.
Remark IV.2
With the reasonable assumption that the dimensions of the input matrices are large enough such that , we can conclude that the encoding and decoding costs at the master and fusion nodes, respectively, are negligible compared to the computation cost at each worker node.
V Numerically Stable Codes for Matrix Multiplication via OrthoMatDot Codes with Chebyshev Polynomials
In this section, we specialize OrthoMatDot Codes by restricting the orthonormal polynomials to be Chebyshev polynomials of the first kind with the evaluation points chosen to be the -dimensional Chebyshev grid, i.e., . Our specialized OrthoMatDot, described in Construction 2 in Section V-A, develops a decoding that involves inversion of a sub-matrix of a Chebyshev-Vandermonde matrix. One of the main technical results of this section (and paper), presented in Theorem V.1 in Section V-B, is an upper bound to the worst case condition number over all possible sub-matrices of the Chebeshev-Vandermonde matrix for the case where the distinct evaluation points are chosen as the Chebyshev points of degree , i.e., . In fact, the derived bound shows that the worst case condition number grows at most polynomially in at a fixed number of straggler/parity worker nodes. This is in contrast with the monomial basis codes where the condition number grows exponentially in , even when there is no redundancy [16, 17, 18, 19]. We show through numerical experiments in Section V-C that our proposed codes provide significantly lower numerical errors as compared to MatDot Codes in [3].
V-A Chebyshev Polynomials based OrthoMatDot Code Construction
Recalling from Example III.1 that form an orthonormal polynomial set relative to the weight function , in Construction 2, we explain the application of Chebyshev polynomials of the first kind to Construction 1. Note that, in Construction 2, we assume that the input matrices and are also split as in (38), and let be distinct real numbers in the range , and define the encoding functions as and and let .
The idea of our Chebyshev polynomials based OrthoMatDot code is as follows: First, for every , the master node sends to the -th worker node and . Next, for every , the -th worker node computes the matrix product and sends the result to the fusion node. Once the fusion node receives the output of any worker nodes, it interpolates . Then, it evaluates at where ’s are as defined in (10), and computes , where based on 3) in Remark III.1.
A formal description of our Chebyshev polynomials based OrthoMatDot code is provided in Construction 2. Construction 2 uses the following notation. We let the -th entry of the matrix polynomial be denoted and written as . Also, following the notation in Section III-B, we define the Chebyshev-Vandermonde matrices and , for any subset , we also define the matrix . Finally, we assume that our construction returns an matrix representing the result of the product , where the -th entry of is .
V-A1 Complexity Analyses:
The different encoding complexity, computational complexity per worker, decoding complexity and communication cost for Chebyshev polynomials based OrthoMatDot are the same as their counterparts of OrthoMatDot stated in Section IV-B1.
V-B Evaluation Points and Condition Number Bound
When there is no redundancy, i.e., it is well known that the decoding matrix has condition number with the as well as the Frobenius norms [17]. Note the remarkable contrast with the Vandermonde matrix, whose condition number for real-valued evaluation points grows exponentially in , no matter how the nodes are chosen [16, 17]. Our problem differs from the standard problem in numerical methods, since we have to choose a rectangular “generator” matrix where every square sub-matrix is well-conditioned. In particular, even for Chebyshev-Vandermonde matrix, if the evaluation points are not chosen carefully, they are poorly conditioned [19] (also see Fig. 8). Here, we show that choosing leads to a well-conditioned system with redundant nodes. Our goal is to choose vector such that is sufficiently small, where denotes the worst case condition number over all possible sub-matrices of .
Theorem V.1
For any ,
[TABLE]
where denotes the worst case condition number over all possible sub-matrices of with respect to the Frobenius norm, are the roots of the Chebyshev polynomial , i.e., .
Since the above bound applies to the standard matrix norm as well. The proof uses techniques from numerical methods, and is provided in Appendix B.
Remark V.1
Although the bound in Theorem V.1 is derived for , the theorem also applies for . This is because it can be shown using simple matrix operations that for any , for a subset such that ,
V-C Numerical Results
The numerical stability of our codes is determined by the condition number of sub-matrices of The natural comparison is with MatDot Codes where the decoding depends on effectively inverting square sub-matrices of
[TABLE]
Based on the result of Theorem V.1, we choose In our experiments, we consider systems with various number of worker nodes, namely, . We compare with . We also compare the average condition number of all sub-matrices of and all sub-matrices of . The results, in Fig. 7, show that, for every examined system, the maximum and average condition numbers of the sub-matrices of are less than its MatDot Codes counterparts, especially for larger systems with and worker nodes. In fact, for these specific systems, the improvement in the condition number is around a scaling of .
Fig. 8 shows how the maximum/average condition number of the sub-matrices of grows with the size of the distributed system given a fixed number of redundant worker nodes, namely 1 and 3, and compares with MatDot Codes. The figure shows that while MatDot Codes provide a reasonable condition number to distributed systems with size up to only worker nodes, Construction 2 can afford distributed systems with size up to worker nodes for the same condition number bound .
As a reflection to the significant higher stability of Chebyshev polynomials based OrthoMatDot compared to MatDot Codes, Fig. 9 shows that Chebyshev polynomials based OrthoMatDot provides much more accurate outputs compared to MatDot Codes. For the experiments whose results are shown in Fig. 9, the entries of the input matrices are chosen independently according to the standard Gaussian distribution . In addition, for any two input matrices , let be the output of the distributed system (which is not necessarily equal to the correct answer ), we define the relative error between and to be
[TABLE]
Fig. 9 shows how the maximum relative error (the worst case relative error given a fixed number of parity workers among all the successful nodes scenarios) grows with the size of the distributed system. In Fig. 9, we plot the average result of five different realizations of the system at each system size . The figure shows that MatDot Codes crushes after the size of the system exceeds workers, providing a relative error of around . On the other hand, our OrthoMatDot construction can support systems with sizes up to worker nodes only allowing for a relative error . It is also worth mentioning that in our experiments, we use the MATLAB command [29] for matrix inversion. We have also tried matrices inversion through the Bjork-Pereyra algorithm [30], however, its results were much less accurate than , especially for large systems with a number of worker nodes .
Remark V.2
A main challenge in this work is that we assume operations over the real field. For finite fields, one can always perform arithmetic operations with no errors. Although this fact may motivate a simple solution to the numerical stability of real-valued computations by rounding the computation’s inputs to a finite field’s elements and performing computations over this finite field, such solution has limited applicability, especially for inputs with wide range, due to the following reason. Since performing arithmetic operations over a finite field requires representing each element of as an element in through a bit representation, this solution is applicable in machines with fixed point operations and word sizes of at least . However, the solution is not applicable in machines with floating point operations since in floating point representation not all the intermediate values between the minimum and the maximum representable values can be represented, this is a drawback of the floating point representation over the fixed point representation, though floating point representation can represent a wider range of values than fixed point representation for the same word size.
VI OrthoPoly: Low Communication/Computation Numerically Stable Codes for Distributed Marix Multiplication
While MatDot Codes [3] have an optimal recovery threshold of , they have relatively higher computation cost per worker and worker node to fusion node communication cost as compared to Polynomial Codes [2]. In this section, motivated by the condition number bound in Theorem V.1, we use the idea of using Chebyshev polynomials to provide a numerically stable code construction for matrix multiplication that has the same low communication/computation costs as Polynomial Codes, as well as the same recovery threshold. However, as will be shown in this section, our proposed codes, denoted by OrthoPoly, provides lower numerical errors than Polynomial Codes. In this section, we follow the same system model as in Section IV-A1, and solve the problem statement formulated in Section VI-A. We provide a motivating example in Section VI-B, then we provide the general code construction in Section VI-C. Finally, in Section VI-D, we show experimentally that OrthoPoly Codes achieve lower numerical errors as compared to Polynomial Codes.
VI-A Problem Formulation
The master node possesses two real-valued input matrices , with dimensions , , respectively. Every worker node receives from the master node an encoded matrix of of dimension and an encoded matrix of of dimension , and performs matrix multiplication of these two received inputs. Upon performing the matrix multiplication, each worker node sends the result to the fusion node. The fusion node needs to recover the matrix multiplication once it receives the results of any worker nodes.
VI-B Example
Consider computing the matrix multiplication , for some two real matrices of dimensions and , respectively, over a distributed system of workers such that:
Each worker receives an encoded matrix of of dimension , and an encoded matrix of of dimension . 2. 2.
The product can be recovered by the fusion node given the results of any worker nodes.
A solution can be as follows: First, matrices can be partitioned as
[TABLE]
where, for any , has dimension , and has dimension . Next, let
[TABLE]
Now, can be written as
[TABLE]
Since is a degree polynomial, once the fusion node receives the output of any workers, it can interpolate , i.e., obtain its matrix coefficients, let such matrix coefficients be . Specifically, for any , let be the matrix coefficient of in . Now, recalling (49), the product can be written as
[TABLE]
While the obtained set of matrix coefficients is not equal to , ’s are linear combinations of ’s. Specifically, for any , , let be its -th entry, and, for any , let be the -th entry of the product , we can write
[TABLE]
for any . Thus, the products can be obtained by computing
[TABLE]
for all . In the following, we provide the general code construction.
VI-C OrthoPoly Code Construction
We assume that matrix is split horizontally into equal sub-matrices, of dimension each, and matrix is split vertically into equal sub-matrices, of dimension each, as follows:
[TABLE]
and define two encoding polynomials and and let . We describe, next, the idea of the general code construction. First, for all , the master node sends to the -th worker evaluations of and at , that is, it sends and to the -th worker. Next, for every , the -th worker node computes the matrix product and sends the result to the fusion node. Once the fusion node receives the output of any worker nodes, it interpolates . Next, the fusion node recovers the products , from the matrix coefficients of using a low complexity matrix-vector multiplication, specified later in Construction 3. We formally present our OrthoPoly Codes in Construction 3. In the following, we explain the notation used in Construction 3. The output of the algorithm is the matrix where the -th block of is the matrix , and the -th entry of any matrix is . The -th entry of the matrix polynomial is denoted as , and Section III-B defines matrices and , for any subset . In addition, is an matrix of the following form \mathbf{H}=\left(\begin{array}[]{ccc}\mathbf{H}_{0}~{}~{}\mathbf{H}_{1}~{}~{}\cdots~{}~{}\mathbf{H}_{n-1}\end{array}\right), where is an matrix with ones on the main diagonal and zeros elsewhere, and for any , is an matrix of the following structure
[TABLE]
where the value in the first column is at the -th row of .
VI-C1 Complexity Analyses of OrthoPoly
Encoding Complexity: Encoding for each worker requires performing two additions, the first one adds scaled matrices of size and the other adds scaled matrices of size , for an overall encoding complexity for each worker of . Therefore, the overall computational complexity of encoding for workers is .
Computational Cost per Worker: Each worker multiplies two matrices of dimensions and , requiring operations.
Decoding Complexity: Since has degree , the interpolation of requires the inversion of a matrix, with complexity , and performing matrix-vector multiplications, each of them is between the inverted matrix and a column vector of length of the received evaluations of the matrix polynomial at some position , with complexity . Thus, assuming that , the overall decoding complexity is .
Communication Cost: The master node sends symbols, and the fusion node receives symbols from the successful worker nodes.
Remark VI.1
With the reasonable assumption that the dimensions of the input matrices are large enough such that , we can conclude that the encoding and decoding costs at the master and fusion nodes, respectively, are negligible compared to the computation cost at each worker node.
VI-D Numerical Results
In our experiments, the entries of the input matrices are chosen independently according to the standard Gaussian distribution . In addition, for any two input matrices , let be the output of the distributed system, we define the relative error between and to be
[TABLE]
Fig. 10 shows how the maximum relative error (the worst case relative error given a fixed number of parity workers among all the successful nodes scenarios) grows with the size of the distributed system for both Construction 3 and Polynomial Codes. In Fig. 10, we plot the average result of five different realizations of the system at each system size . The figure shows that Polynomial Codes have unacceptable relative errors after the size of the system exceeds workers, providing a relative error of around . On the other hand, OrthoPoly can support systems with sizes up to worker nodes only allowing for a relative error .
VII Generalized OrthoMatDot: Numerically stable Codes for Matrix Multiplication with Communication/Computation-Recovery Threshold Trade-off
Although MatDot Codes [3] have a low recovery threshold of as compared with Polynomial Codes [2] which have a recovery threshold of , MatDot Codes’ worker to fusion nodes communication cost and computation cost per worker are higher than Polynomial Codes. Codes proposed in [4, 5, 26] offer a trade-off between the communication/computation cost and the recovery threshold. However, all of these codes are based on the “ill-conditioned” monomial basis. In this section, we offer a numerically stable code construction, denoted by Generalized OrthoMatDot, that offers a trade-off between communication/computation costs and recovery threshold. Our construction incurs a higher recovery threshold than the codes of [5, 26] by a factor of at most for the same communication/computation cost. We provide in Section VII-A the formal problem statement considered in this section. We describe an example of our construction in Section VII-B, provide the general code construction in Section VII-C, and describe our numerical experiments in Section VII-D.
VII-A System Model and Problem Formulation
We consider the same system model and problem formulation as in Section IV-A with the following change: We assume that the master node is allowed to send an encoded fraction of matrix , and an encoded fraction of matrix , where and are not necessarily equal, and and are split as follows
[TABLE]
where divide , respectively, and . In addition, we assume that each worker node receives a linear combination of sub-matrices , and another linear combination of sub-matrices .
Remark VII.1
Although, in this section, we offer Generalized OrthoMatDot, a code construction with lower condition numbers than codes in [5, 26], the recovery threshold of our codes are higher by a factor of at most than the codes of these references. Specifically, Generalized OrthoMatDot codes have a recovery threshold of while both codes in [5, 26] have a recovery threshold of . This increased recovery threshold is due to the fact that Generalized OrthoMatDot Codes are based on Chebyshev polynomials which have the following property: For any , . This property allows for a higher number of undesired terms in the multiplication of the encoding polynomials . In order to avoid combining undesired and desired terms at the same degree, higher degree Chebyshev polynomials have to be used in , yielding a higher recovery threshold. It is still an open question whether the recovery threshold in [5, 26] can be achieved using orthonormal polynomials.
VII-B Example
Consider computing the matrix multiplication , for some two real matrices of dimensions and , respectively, over a distributed system of workers such that:
Each worker receives an encoded matrix of of dimension , and an encoded matrix of of dimension . 2. 2.
The product can be recovered by the fusion node given the results of any worker nodes.
A solution can be as follows: First, matrices can be partitioned as
[TABLE]
where, for , has dimension , and has dimension . Next, let
[TABLE]
where to be specified next, and define distinct real numbers in the range . For each worker node , the master node sends .
Now, in order to specify the best values for , we expand the polynomial in the Chebyshev basis, and then point out some observations.
[TABLE]
Using the property of the Chebyshev polynomials that for any , , (VII-B) can be rewritten as
[TABLE]
Now, note the following regrading in (VII-B):
- (i)
is the coefficient of , 2. (ii)
is the coefficient of , 3. (iii)
is the coefficient of , 4. (iv)
is the coefficient of .
Since has degree , and this polynomial is evaluated at distinct value at each worker node, once the fusion node receives the output of any worker nodes, it can interpolate and extract the product (i.e., the matrix coefficients of , ). Now, we aim for picking values for such that the degree of is minimal; and hence, the recovery threshold is minimal as well. These minimal values for must be chosen such that the desired coefficients in (i)-(iv) are separate. That is, each of them is neither combined with another desired nor undesired term. This constraint leads to the following two inequalities:
[TABLE]
which implies that . Next, we provide our general code construction for the Generalized OrthoMatDot Codes.
VII-C Generalized OrthoMatDot Code Construction
Theorem VII.1
For the matrix multiplication problem described in Section VII-A computed on the system defined in Section IV-A1, there exists a coding strategy with recovery threshold
[TABLE]
Notice that the problem specified in Section VII-A restricts the output matrix of each worker node to be of dimension , for some positive integers that divide , respectively. This is smaller than the dimensions of the output matrix of each worker node according to the problem specified in Section IV-A2 (i.e., ) by a factor of . However, according to Theorem VII.1, this communication advantage, when or , comes at the expense of a higher recovery threshold compared to OrthoMatDot Codes.
Remark VII.2** (Notation)**
For ease of exposition in the remaining of this section, we use to denote , respectively.
In order to prove Theorem VII.1, we first present a code construction that achieves the recovery threshold in (VII.1), then we prove that the presented code construction is valid. First, note that in the Generalized OrthoMatDot code construction, we assume that the two input matrices are split as in (115). Also, note that given this partitioning of input matrices, we can write , where is written as
[TABLE]
and each of has dimension and can be expressed as for any and . Also, let be distinct real numbers in the range , and define encoding polynomials
[TABLE]
and let . Notice that is a polynomial matrix of degree equals .
Claim VII.2
For any and , is the matrix coefficient of in ,
The proof of this claim is in Appendix C.
We describe, next, the idea of our proposed Generalized OrthoMatDot code construction. First, for all , the master node sends to the -th worker evaluations of and at , that is, it sends and to the -th worker. Next, for every , the -th worker node computes the matrix product and sends the result to the fusion node. Once the fusion node receives the output of any worker nodes, it interpolates .
We formally present our Generalized OrthoMatDot code construction in Construction 4. In the following, we explain the notation used in Construction 4. The output of the algorithm is the matrix where the -th block of is the matrix , and the -th entry of any matrix is . The -th entry of the matrix polynomial is denoted as , and Section III-B defines matrices and , for any subset .
Now, we prove Theorem VII.1.
Proof:
To prove the theorem, it suffices to prove that Construction 4 is valid. Noting that has degree and every worker node sends an evaluation of at a distinct point, once the fusion node receives the output of any worker node, it can interpolate (i.e., obtain all its matrix coefficients). This includes the coefficients of for all and , i.e., , for all and (Claim VII.2), which completes the proof. ∎
Next, we provide the different complexity analyses of the Generalized OrthoMatDot Codes.
VII-C1 Complexity Analyses of Generalized OrthoMatDot
Encoding Complexity: Encoding for each worker requires performing two additions, the first one adds scaled matrices of size and the other adds scaled matrices of size , for an overall encoding complexity for each worker of . Therefore, the overall computational complexity of encoding for workers is .
Computational Cost per Worker: Each worker multiplies two matrices of dimensions and , requiring operations.
Decoding Complexity: Since has degree , the interpolation of requires the inversion of a matrix, with complexity , and performing matrix-vector multiplications, each of them is between the inverted matrix and a column vector of length of the received evaluations of the matrix polynomial at some position , with complexity . Thus, assuming that , the overall decoding complexity is .
Communication Cost: The master node sends symbols, and the fusion node receives symbols from the successful worker nodes.
Remark VII.3
With the reasonable assumption that the dimensions of the input matrices are large enough such that , we can conclude that the encoding and decoding costs at the master and fusion nodes, respectively, are negligible compared to the computation cost at each worker node.
VII-D Numerical Results
In our experiments on Construction 4, we considered distributed systems with worker nodes. Fig. 11 shows that, for every examined system, the condition number of the interpolation matrix using the Generalized OrthoMatDot Codes is less than its counterpart codes in [5, 26]. The results in Fig. 11 also show that, for the same system, as the partitioning factor decreases (i.e., as the redundancy in worker nodes increases), the stability of the Generalized OrthoMatDot code construction decreases; however, it is still better than the monomial-basis based codes in any cases.
VIII Numerically Stable Lagrange Coded Computing
In this section, we study the numerical stability of Lagrange coded computing [12] that lifts coded computing beyond matrix-vector and matrix-matrix multiplications, to multi-variate polynomial computations. As shown in [12], Lagrange coded computing has applications in gradient coding, privacy and secrecy. Our main contribution here is to develop a numerically stable approach towards Lagrange coded computing inspired by our result of Theorem V.1. In particular, our contribution involves (a) careful choice of evaluation points, and (b) a careful decoding algorithm that involves inversion of the appropriate Chebyshev Vandermonde matrix. We describe the system model in Section VIII-A. We overview the Lagrange coded computing technique of [12] in Section VIII-B. We describe our numerically stable approach in Section VIII-C, and present the results of our numerical experiments in Section VIII-D.
VIII-A System Model and Problem Formulation
We consider, for this section, the distributed computing framework depicted in Fig. 12, that is used in [12] and consists of a master node, worker nodes, and a fusion node where the only communication allowed is from the master node to the different worker nodes and from the worker nodes to the fusion node. The worker nodes have a prior knowledge of a polynomial function of interest of degree , where . In addition, the master node possesses a set of data points , where , . For every worker node , the master node is allowed to send some encoded vector . Once a worker node receives the encoded vector on its input, it evaluates at this encoded vector and sends the evaluation to the fusion node. That is, for , worker node receives on its input, evaluates , then it sends the result to the fusion node. Finally, the fusion node is expected to numerically stably decode the set of evaluations after it receives the output of any worker nodes.
VIII-B Background on Lagrange Coded Computing
In this section, we review the baseline Lagrange coded computing method introduced in [12] considering the framework in Section VIII-A. Notice that although the method in [12] is more general, here, for simplicity, we limit our discussion to the systematic Lagrange coded computing. That is, we assume that for , worker node receives the -th data point from the master node. In other words, we assume that . Now, the encoding procedure goes as follows: First, let be distinct real values, an encoding function is defined as:
[TABLE]
Given this encoding function, the master node sends the encoded vector to the worker node , for every . Notice that the encoding function indeed leads to a systematic encoding since for all . Every worker node computes upon the reception of , and sends the result to the fusion node. The fusion node waits till receiving the output of any . Since has degree in , the fusion node is able to interpolate after receiving the outputs of any , i.e., , worker nodes. Since , the fusion nodes evaluates to obtain
VIII-C Numerically Stable Lagrange Coded Computing
Lagrange coded computing requires performing an interpolation at the fusion node to recover the polynomial . Performing the interpolation by obtaining the coefficients of the polynomial in a monomial basis requires inverting a square Vandermonde matrix which is numerically unstable. Noting that the first Cheybshev polynomials also forms a basis for degree polynomials, we provide an alternative decoding procedure whose key idea is to find the coefficients of polynomial in the basis of Chebyshev polynomials. Thereby, our decoding procedure involves inverting the Chebyshev-Vandermonde matrix444Since both systematic and non-systematic Lagrange coded computing require the inversion of the same Chebyshev-Vandermonde matrix, our numerically stable decoding procedure in Construction 5 naturally extends to non-systematic Lagrange coded computing, with the only difference is in the last step of evaluating at , where in the non-systematic case, is instead evaluated at some predefined values such that for all . Guided by Theorem V.1, we choose the evaluation points to be the -point Chebyshev grid to obtain a decoding procedure that is more stable than one that uses the monomial basis.
Our numerically stable algorithm for Lagrange coded computing is formally described in Construction 5.
In the following, we explain the notation used in Construction 5. We let the polynomial at the -th entry of be denoted and written as . Following the notation in Section III-B, we use the Chebyshev-Vandermonde matrices , and , for any subset , we also define the matrix . Finally, we assume that our construction returns as output the set of evaluations , where for each , we have , where for every and would be the same if the machine had infinite precision.
In the following, we show through numerical experiments the stability of our proposed Construction 5.
VIII-D Numerical Results
In our experiments, we assume that we have a distributed system of worker nodes, data points/input vectors , each of them is of dimension , where each entry of every input vector is picked independently, according to the standard Gaussian distribution . The function of interest in this system is , where is some -dimensional vector with entries picked independently according to the standard Gaussian distribution . In our experiments, we compare between Construction 5, where the Chebyshev basis is used for interpolation, and the case where the monomial basis is used for interpolation instead. Let be the system’s output vector, and be the correct output vector, we define the relative error between and to be
[TABLE]
The results, shown in Fig. 13, illustrates that using the Chebyshev basis for interpolation provides less relative error/higher stability than the monomial basis at every system size. Fig. 13 also shows that under a certain relative error constraint, Construction 5 provides higher scalability than the monomial basis case. Specifically, let us assume that a relative error up to can be tolerated, Fig. 13 shows that the monomial-basis interpolation construction can support systems with a number of worker nodes only less than . However, for the same relative error constraint, Construction 5 can support systems with a number of worker nodes up to .
IX Concluding Remarks
In this paper, we develop numerically stable codes for matrix-matrix multiplication and Lagrange coded computing. A distinctive character of our work is the infusion of principles of numerical approximation theory into coded computing towards the end goal of numerical stability. In particular, our work is marked by the use of orthogonal polynomials for encoding, Gauss quadrature techniques for decoding and new bounds on the condition number of Chebyshev Vandermonde matrices. Notably, our constructions obtain the same recovery threshold as MatDot Codes and Polynomial Codes for matrix multiplication as well as for Lagrange Coded Computing. However, our construction in Section VII obtains a weaker (higher) recovery threshold than previous constructions [26, 5] for the problem of coded matrix multiplication when the computation/communication cost is constrained to be lower than that of MatDot Codes. The search of numerically stable codes for this application with the same recovery threshold as [26, 5] remains open.
While our paper focuses on applications where polynomial based encoding are particularly useful, our results might be useful for other applications as well. For instance, for the simple matrix-multiplication problem performed in a distributed setting over worker nodes, where the goal is to encode such that each worker stores a partition of matrix it is well known that MDS type codes can be used [13, 27]. Specifically, let and let be an matrix where every submatrix of has a full rank of . Then the -th worker for can compute the product can be recovered from any of the nodes. The instinctual, Reed-Solomon inspired solution of choosing to be a Vandermode matrix is ill-conditioned over real numbers. Note however that, unlike the matrix multiplication problem, the matrix does not need to have a polynomial structure. Indeed, choosing to be a random Gaussian matrix leads to well-conditioned solutions with high probability. In particular, the following result follows from elementary arguments that build on [31].
Theorem IX.1
Let be an matrix, , and let the entries of be independent and identically distributed standard Gaussian random variables. Then,
[TABLE]
The theorem which is proved in Appendix D, formally demonstrates that for a fixed number of redundant workers the worst case condition number grows as with high probability. However, the random Gaussian matrix approach has two drawbacks: (i) for a given realization of the random variables, it is difficult to verify whether it is well-conditioned, and (ii) the lack of structure could lead to more complex decoding. Our result of Theorem V.1 also indicates that choosing i.e., to be a Chebyshev Vandermonde matrix, naturally provides a well-conditioned solution to this problem. Another solution for the matrix-vector multiplication problem is provided in [25] via universally decodable matrices [32]; in this work numerical stability is demonstrated empirically.
It is, however, important to note that the problems resolved in our paper here are more restrictive since matrix multiplication codes - where both matrices are to be encoded so that the product can be recovered - require much more structure than matrix-multiplication where only one matrix is to be encoded. For instance, random Gaussian encoding does not naturally work for matrix multiplication to get a recovery threshold of , and it is not clear whether the solution of [25] is applicable either. The utility of Chebyshev-Vandermonde matrices for a variety of coded computing problems including matrix-vector multiplication, matrix multiplication and Lagrange coded computing motivates the study of low-complexity decoding and error correction mechanisms for these systems.
Appendix A Proof of Claim IV.2
We have,
[TABLE]
In addition, noting that (i.e., ) is of degree (less than ), Theorem III.2 implies that
[TABLE]
Finally, combining (A) and (A) completes the proof.
Appendix B Proof of Theorem V.1
We use the following trigonometric identity in our proof.
Lemma B.1
For , let be chosen as (10). Then
Proof:
Note that . Therefore,
[TABLE]
where denotes the derivative of Using above we get the desired result. ∎
Proof:
We show that any square sub-matrix of formed by any columns of satisfies the bound stated in the theorem. Let be a subset of such that , for some , and define to be the square submatrix of after removing the columns with indices in . Recalling the structure of from (32), we can write it as
[TABLE]
Moreover, for any such that , we can write
[TABLE]
where , where and . Now, notice that , and for any . Therefore, we have
[TABLE]
In the following, we obtain an upper bound on . Let be the -th Lagrange polynomial associated with , that is,
[TABLE]
Since has a degree of , it can be written in terms of the Chebyshev basis as
[TABLE]
for some real coefficients . Now, from (146), note the following property regarding :
[TABLE]
Using this property and observing (147), we conclude that, for any , . Therefore,
[TABLE]
where is the identity matrix. That is,
[TABLE]
Therefore,
[TABLE]
In addition, we have that
[TABLE]
From (159) and (B), we conclude that .
Now, we express the integral in the Gauss quadrature form using the roots of . Note that this is a “trick” we use in the proof - it is possible to use the Gauss quadrature formula over nodes to express the integral of the degree polynomial . However, the use of nodes instead of nodes leads to simple tractable bound for Now, we can write
[TABLE]
for some constants . Moreover, for the Chebyshev polynomials of the first kind are, in fact, all equal to . Therefore, we have
[TABLE]
and, consequently,
[TABLE]
Now, from (146), note that has the following evaluations
[TABLE]
Therefore, (163) can be written as
[TABLE]
In order to obtain our upper bound on , in the following, we get an upper bound on the term in (B). Notice that can be written as
[TABLE]
where the last equality follows from Lemma B.1. Moreover, the product in (B) can be written as
[TABLE]
where the last equality follows from Lemma B.1. Now, substituting from (B) in (B) yields
[TABLE]
Using (B) in (B), we conclude that
[TABLE]
Finally, combining (145) and (172), we conclude that
[TABLE]
∎
Appendix C Proof of Claim VII.2
Let . in (VII-C) can be written as
[TABLE]
Similarly, in (VII-C) can be written as
[TABLE]
Now, the product can be written as
[TABLE]
where,
[TABLE]
and,
[TABLE]
Now, in order to prove the claim, it suffices to prove the following two statements:
For any , is the matrix coefficient of in . 2. 2.
For any , the matrix coefficient of in is , where is the all zeros matrix.
In the following, we prove that statement 1) is true. In order to find the coefficient of in , we find the set . Rewriting , we have
[TABLE]
(177) implies that . Suppose , this means that for some integer . However, this is a contradiction since , for any . Now, (177) can be written as
[TABLE]
Again, (178) implies . Suppose , this means , for some integer . However, this is a contradiction since . Now, since , (178) implies . Thus, . That is, for any , the matrix coefficient of in is .
Now, it remains to prove statement 2). That is, for any , the matrix coefficient of in is . In order to find the coefficient of in , we find the sets , and .
First, for the set , rewriting , we get
[TABLE]
From (179), we conclude that . Otherwise, , for some integer , a contradiction since . Since and both are non-negative, we conclude that . Moreover, now (179) reduces to
[TABLE]
Again, since , we conclude that , which implies that . Since and both are non-negative, we conclude that . Thus, . Now, noticing from (C) that does not contribute to any term in , we conclude that the matrix coefficient of in is only due to the set . Recall that , we rewrite as
[TABLE]
From (181), we conclude that . Otherwise, , for some integer , a contradiction since . Moreover, now (181) reduces to
[TABLE]
Again, since , we conclude that . Since and both are non-negative, we conclude that , which implies that . Since and both , we conclude that . Thus, . Now, noticing from (C) that does not contribute to any term in , we conclude that the matrix coefficient of in is .
Appendix D Upper Bound on the Condition Number of Gaussian Matrices
We first introduce the following theorem from [31].
Theorem D.1
Let be an matrix, , and let the entries of be independent and identically distributed standard Gaussian random variables. Then, for all ,
[TABLE]
where is the condition number of with respect to the matrix norm induced by .
As a consequence, in the following, we extend the result in Theorem D.1 to bound the condition number of every sub-matrix of a random matrix with standard Gaussian entries, .
Proof:
For any subset , let denote the sub-matrix of containing the columns corresponding to and let . Then we have
[TABLE]
where follows from the union bound, and follows from the fact that \left(\begin{array}[]{cc}P\\ s\end{array}\right)\leq P^{s} and Theorem D.1. ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] S. Dutta, V. Cadambe, and P. Grover, “Short-Dot: Computing Large Linear Transforms Distributedly Using Coded Short Dot Products,” in Advances In Neural Information Processing Systems (NIPS) , 2016, pp. 2092–2100.
- 2[2] Q. Yu, M. A. Maddah-Ali, and A. S. Avestimehr, “Polynomial Codes: an Optimal Design for High-Dimensional Coded Matrix Multiplication,” in Advances In Neural Information Processing Systems (NIPS) , 2017, pp. 4403–4413.
- 3[3] M. Fahim, H. Jeong, F. Haddadpour, S. Dutta, V. Cadambe, and P. Grover, “On the optimal recovery threshold of coded matrix multiplication,” in Communication, Control, and Computing (Allerton) , Oct 2017, pp. 1264–1270.
- 4[4] S. Dutta, M. Fahim, F. Haddadpour, H. Jeong, V. R. Cadambe, and P. Grover, “On the optimal recovery threshold of coded matrix multiplication,” Co RR , vol. abs/1801.10292, 2018, Accepted to appear in IEEE Transactions on Information Theory .
- 5[5] S. Dutta, Z. Bai, H. Jeong, T. M. Low, and P. Grover, “A unified coded deep neural network training strategy based on generalized polydot codes,” in 2018 IEEE International Symposium on Information Theory (ISIT) , June 2018, pp. 1585–1589, http://arxiv.org/abs/1811.10 751.
- 6[6] R. Tandon, Q. Lei, A. G. Dimakis, and N. Karampatziakis, “Gradient coding,” in Machine Learning Systems Workshop, Advances in Neural Information Processing Systems (NIPS) , 2016.
- 7[7] ——, “Gradient coding: Avoiding stragglers in distributed learning,” in International Conference on Machine Learning , 2017, pp. 3368–3376.
- 8[8] M. Ye and E. Abbe, “Communication-computation efficient gradient coding,” in Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018 , 2018, pp. 5606–5615. [Online]. Available: http://proceedings.mlr.press/v 80/ye 18a.html
