Random Sampling for Distributed Coded Matrix Multiplication
Wei-Ting Chang, Ravi Tandon

TL;DR
This paper explores the use of random sampling combined with coding techniques to perform approximate distributed matrix multiplication efficiently, balancing recovery threshold and approximation error.
Contribution
It introduces two novel coded randomized sampling schemes that leverage coding and randomization for approximate matrix multiplication in distributed systems.
Findings
Tradeoffs between recovery threshold and approximation error are characterized.
Proposed schemes achieve robustness to stragglers with controlled approximation.
The methods improve efficiency in large-scale matrix computations.
Abstract
Matrix multiplication is a fundamental building block for large scale computations arising in various applications, including machine learning. There has been significant recent interest in using coding to speed up distributed matrix multiplication, that are robust to stragglers (i.e., machines that may perform slower computations). In many scenarios, instead of exact computation, approximate matrix multiplication, i.e., allowing for a tolerable error is also sufficient. Such approximate schemes make use of randomization techniques to speed up the computation process. In this paper, we initiate the study of approximate coded matrix multiplication, and investigate the joint synergies offered by randomization and coding. Specifically, we propose two coded randomized sampling schemes that use (a) codes to achieve a desired recovery threshold and (b) random sampling to obtain approximation…
| Independent Sampling | Set-wise Sampling | |||
| Recovery | Uniform | Optimal | Uniform | Optimal |
| Threshold | ||||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Complexity and Algorithms in Graphs · Privacy-Preserving Technologies in Data
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
Random Sampling for Distributed Coded Matrix Multiplication
Wei-Ting Chang Ravi Tandon
Department of Electrical and Computer Engineering
University of Arizona, Tucson, AZ, USA
E-mail: {wchang, tandonr}@email.arizona.edu
Abstract
Matrix multiplication is a fundamental building block for large scale computations arising in various applications, including machine learning. There has been significant recent interest in using coding to speed up distributed matrix multiplication, that are robust to stragglers (i.e., machines that may perform slower computations). In many scenarios, instead of exact computation, approximate matrix multiplication, i.e., allowing for a tolerable error is also sufficient. Such approximate schemes make use of randomization techniques to speed up the computation process. In this paper, we initiate the study of approximate coded matrix multiplication, and investigate the joint synergies offered by randomization and coding. Specifically, we propose two coded randomized sampling schemes that use (a) codes to achieve a desired recovery threshold and (b) random sampling to obtain approximation of the matrix multiplication. Tradeoffs between the recovery threshold and approximation error obtained through random sampling are investigated for a class of coded matrix multiplication schemes.
Keywords – Matrix multiplication, Random sampling, Coded Distributed Computing
I Introduction
††This work was supported by NSF Grant CAREER 1651492.
Matrix multiplication has been one of the most essential fundamental building blocks for various applications in fields such as signal and image processing, machine learning, optimization and wireless communications. Outsourcing the computations to distributed machines has become a preferable way to speed up the process when one is dealing with large scale data. However, distributed systems suffer from the straggler effect where the slowest worker(s) can limit the speed-ups offered by distributed computation.
In order to mitigate the impact of stragglers, the idea of using coded distributed computation has gained significant recent interest. In general, these codes are used to introduce redundancy to the computations. For example, by applying one of the simplest codes - repetition codes, one can let multiple machines work on the same computation. One can then obtain the desired result whenever the fastest machine finishes the assigned tasks. Much more efficient codes have been applied to the distributed computing problems. Significant recent progress has been made on understanding the additional speed-ups gained by mitigating stragglers using codes. Several codes that are particularly efficient for the distributed matrix multiplication problems include Polynomial codes, MatDot codes and Lagrange codes [1, 2, 3, 4]. These codes add redundancy in a way that one can obtain the desired result with the responses from an arbitrary subset of machines. The smallest number of machines which allow perfect recovery of the computation is referred as the recovery threshold.
In contrast to adding redundancy, another methodology to speed up matrix multiplication comes from the idea of randomization. By allowing some tolerable error in the computation, randomized algorithms can provide speed-ups by working on matrices of smaller dimensionality. However, the randomization techniques must be carefully designed, in order to provide guarantees on the error. Random sampling and random projection are two commonly used techniques for this purpose. Random sampling algorithms sample either the columns or rows from the original matrix to construct sketches of original matrices, and the subsequent task is performed on sketched matrices. The key to a good sampling scheme is to carefully design what to sample, since not all columns/rows carry the same amount of information. Several works on random sampling include [5, 6, 7, 8, 9, 10]. Random projection algorithms construct the sketch matrix by projecting the original matrix to a vector space with a lower dimension. Projection algorithms are typically designed to have good distance preserving properties (Johnson-Lindenstrauss lemma [11, 12]), and have been investigated in various works [11, 12, 13, 14, 15, 16].
**Main Contributions: ** In this paper, we explore the synergies between coding and randomization, and explore the tradeoffs between reconstruction error and recovery threshold for distributed matrix multiplication. To answer this question, we devise two novel coded sampling schemes that can achieve various levels of speed-ups depending on how well one wishes to approximate the desired result. For the scope of this paper, we focus on Matdot codes [3], and design sampling strategies tailored to these codes. We present a family of coded sampling schemes, which sample a sub-set of columns from the matrices, followed by application of Matdot codes on the sampled matrices. We analyze two sampling strategies: one where the sampling of rows/columns is done independently (with replacement), and one where we sample a subset of rows/columns (without replacement).
We show that if the matrices to be multiplied are divided into parts (for details, see Section IV), and for any integer , a recovery threshold of is achievable. Moreover, the expected approximation errors of the proposed coded sampling schemes for a recovery threshold of are as follows: , where denotes the set of sampled indices and when coded set-wise sampling scheme is used; and when coded independent sampling scheme is used. These results reveal a tradeoff between recovery threshold and approximation error, i.e., a lower recovery threshold can be obtained by allowing reconstruction error.
II System Model
We consider a distributed system which consists of a master and workers. Each worker is connected to the master through a separate link. The goal of the master is to approximate matrix multiplication , where and , using workers, in the presence of stragglers, for some sufficiently large field . We note that depending on the computation strategy used, the master may not need to wait for all workers to recover the approximation of . The smallest number of workers needed to recover the approximation is referred as the recovery threshold .
To tolerate stragglers, the master encodes and separately, and workers multiply the encoded versions of and . The encoding functions used are and , where and are the encoding functions for worker . Specifically, the encoded matrices for worker are and , where and . We denote the answer from worker as . The master must be able to decode the desired result from any workers. We denote the approximated result as , where is the decoding function. The performance of coded sampling schemes is measured through the expected approximation error , where denotes the Frobenius norm of a matrix . Note that we choose Frobenius norm for its properties, which will be useful for our analysis. Other norms could potentially be used for evaluating the schemes.
III Coded Matrix Multiplication
For the scope of this paper, we focus on one of the codes, namely MatDot codes [3]††footnotemark: . We show the intuition behind MatDot codes and its application to approximate matrix multiplication through an illustrative example.
Example 1**.**
Consider a matrix multiplication problem with workers using -MatDot code, where . The input matrices are partitioned into submatrices as follows,
[TABLE]
where and . The product of can then be written as,
[TABLE]
The submatrices and are encoded as follows,
[TABLE]
for , where and have the same dimensions as and , and is a distinct non-zero element assigned to worker . After encoding, worker computes and sends the result to the master. Without loss of generality, we assume that the first workers respond and the master receives,
[TABLE]
It can be seen that the results can be viewed as distinct evaluations of a degree polynomial. Thus, the master can apply any polynomial interpolation technique and obtain the coefficients and using any evaluations received. Since the desired result can be obtained from any evaluations, we say -MatDot code achieves a recovery threshold of .
††footnotetext: We note that there are many other codes that could potentially be applied to our problem, such as Polynomial and Lagrange codes [1, 2, 4]. Investigating randomization schemes for other codes is part of our ongoing work.
We now introduce the idea of randomization in this context. In particular, for scenarios where approximate matrix multiplication is sufficient, we show that the recovery threshold can be even reduced to . Using the same partition as the previous example, if we want the recovery threshold to be , the master can follow the following strategy: it samples one of the submatrices of and (i.e., either or with a certain probability). The chosen index is a Bernoulli random variable . It then assigns each worker to compute . It waits for only worker, and declares as the approximate answer for . It can be readily shown that the expected value of is with proper scaling. Although is an unbiased estimator of on average, there will be some error in practice, and the sampling scheme must be designed to (a) give an unbiased estimate of , and (b) minimize the resulting error as much as possible. We first briefly summarize the general construction of MatDot, followed by the details of our randomized sampling scheme.
To apply MatDot codes for any that divides , the input matrices and are partitioned into disjoint submatrices horizontally and vertically, respectively, i.e., where and . The submatrices of and are encoded into for worker , where is a distinct non-zero element in assigned to worker . Workers compute the product of their respective and , and return the results to the master. The results can be seen as a polynomial evaluated at distinct points, i.e., , where . The degree of this polynomial is , hence, the coefficients of the polynomial can be interpolated using any evaluations. Note that the desired result is the sum of , and it is the coefficient of . With the ability of computing the desired result from any workers, we say -MatDot achieves a recovery threshold of (see [3] for details).
IV Coded Sampling for Approximate Matrix Multiplication
In this section, we present two coded sampling schemes and study the tradeoff between recovery threshold and approximation error. To apply MatDot, matrices and are partitioned into submatrices horizontally and vertically, respectively. Both schemes sample submatrices from and the corresponding submatrices from , and encode them using MatDot, where the choice of controls both the approximation error and the recovery threshold.
IV-A *Coded Set-wise Sampling *
For the coded set-wise sampling scheme, the master samples a subset of the indices of submatrices, where is picked according to probability . We denote the sampled submatrices as and . The sampled submatrices are then encoded as,
[TABLE]
where the scaling is done to ensure that the approximation is an unbiased estimator of and the choice of the constant will become clear in the analysis. The goal is to approximate using the sum of . Note that this sum is originally a part of . Workers are assigned to compute their respective and return the results. The master receives the results,
[TABLE]
for , corresponding to any workers. As shown in Section III, since the degree of this polynomial is , the coefficients of the polynomial can be interpolated using the results from any workers. The master can then obtain the approximation .
Our main result is stated in the following Theorem:
Theorem 1**.**
For an approximate coded matrix multiplication problem, to achieve a recovery threshold of using -MatDot codes, the expected approximation error of the coded set-wise sampling scheme is as follows,
[TABLE]
by sampling using the optimal distribution shown in the analysis, where denotes the set of sampled indices and .
To prove Theorem 1, we first show that the approximation is an unbiased estimator of . We start by looking at the expected value of the th element of the approximation:
[TABLE]
where (7) follows from the definition of expected value and the design of the scheme, and is the number of times each appears in the summation. Thus,
[TABLE]
Since , we have
[TABLE]
We next find the expected approximation error by calculating:
[TABLE]
where (12) follows from placing the double summations before .
Note that is a constant for fixed and , hence, we can use the method of Lagrange multipliers to find the optimal by putting as a constraint on the first term in (12) and solve for the that minimizes the error. The optimal can be found to be . Plugging in (12) completes the proof of Theorem 1.
We note that the computational complexity of finding the optimal probabilities is , which can be high. A way to overcome this issue is to sample and using uniform distribution at the cost of higher approximation error. We next propose another alternative (and simpler) sampling strategy and obtain the corresponding approximation error.
IV-B *Coded Independent Sampling *
For coded independent sampling, at each iteration, the master samples an index according to probability , the probability that and being sampled at time . After sampling indices, the corresponding submatrices are encoded into . Workers are assigned to compute their respective . The results the master received are
[TABLE]
where . The degree of this polynomial is , hence, the coefficients of the polynomial can be interpolated by using the results from any workers. The master can thus obtain the approximation . The expected error is (following similar steps as in previous section) as follows:
[TABLE]
IV-C Simulation Results
In this section, we present simulation results to show the performance of the two coded randomized sampling schemes. We consider the case where and , where and are partitioned into submatrices. With , the master can sample either or submatrices and achieved recovery thresholds of or , respectively. The normalized errors shown in Fig. 2, 2 and Table I are calculated by computing . It can be seen in Fig. 2 and 2 that the empirical errors obtained by using the optimal sampling distributions have better approximations than the ones obtained by using uniform distributions. Note that in Table I, we can observe that in most cases, coded set-wise sampling has better approximations than coded independent sampling for the same recovery threshold. This is due to the fact that it is possible for the master to sample same submatrices multiple times when using the coded independent sampling scheme. While in coded set-wise sampling, the master always samples fresh submatrices. Furthermore, the errors of coded set-wise sampling always go to zero when as it is equivalent to performing the exact computation of .
V Conclusion
In this paper, we studied the problem of approximate coded matrix multiplication. We presented two novel coded sampling schemes where a subset of columns/rows is sampled from the matrices. The sampled submatrices are then encoded using MatDot codes. The results reveal an interesting tradeoff between recovery threshold and approximation error. Generalizing these ideas for other coded computation schemes is an interesting future research direction.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Qian Yu, Mohammad Ali Maddah-Ali, and Amir Salman Avestimehr, “Polynomial Codes: an Optimal Design for High-Dimensional Coded Matrix Multiplication,” Co RR , vol. abs/1705.10464, 2017. [Online]. Available:http://arxiv.org/abs/1705.10464.
- 2[2] Qian Yu, Mohammad Ali Maddah-Ali, and Amir Salman Avestimehr, “Straggler Mitigation in Distributed Matrix Multiplication: Fundamental Limits and Optimal Coding,” Co RR , vol. abs/1801.07487, 2018. [Online]. Available:http://arxiv.org/abs/1801.07487.
- 3[3] Sanghamitra Dutta, Mohammad Fahim, Farzin Haddadpour, Haewon Jeong, Viveck R. Cadambe, and Pulkit Grover, “On the Optimal Recovery Threshold of Coded Matrix Multiplication,” Co RR , vol. abs/1801.10292, 2018. [Online]. Available:http://arxiv.org/abs/1801.10292.
- 4[4] Qian Yu, Netanel Raviv, Jinhyun So, and Amir Salman Avestimehr, “Lagrange Coded Computing: Optimal Design for Resiliency, Security and Privacy,” Co RR , vol. abs/1806.00939, 2018. [Online]. Available:http://arxiv.org/abs/1806.00939.
- 5[5] Petros Drineas, Ravi Kannan, and Michael W. Mahoney, “Fast Monte Carlo algorithms for matrices I: Approximating matrix multiplication,” SIAM Journal on Computing , vol. 36, no. 1, pp. 132–157, 2006.
- 6[6] Amit Deshpande, Luis Rademacher, Santosh Vempala, and Grant Wang, “Matrix approximation and projective clustering via volume sampling,” in Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm . Society for Industrial and Applied Mathematics, 2006, pp. 1117–1126.
- 7[7] Christos Boutsidis, Michael W. Mahoney, and Petros Drineas, “An improved approximation algorithm for the column subset selection problem,” Co RR , vol. abs/0812.4293, 2008. [Online]. Available:http://arxiv.org/abs/0812.4293.
- 8[8] Venkatesan Guruswami and Ali Kemal Sinop, “Optimal column-based low-rank matrix reconstruction,” Co RR , vol. abs/1104.1732, 2011. [Online]. Available:http://arxiv.org/abs/1104.1732.
