Coded Matrix Multiplication on a Group-Based Model
Muah Kim, Jy-yong Sohn, Jaekyun Moon

TL;DR
This paper introduces a group-based coding scheme for distributed matrix multiplication that models real-world server clusters, achieving near-optimal performance and reduced decoding complexity.
Contribution
It proposes a novel group code tailored for clustered distributed systems, reflecting practical conditions and improving decoding efficiency.
Findings
Achieves asymptotic optimality in large-scale regimes.
Demonstrates near-optimal performance for finite system sizes.
Reduces decoding complexity through parallel decoding.
Abstract
Coded distributed computing has been considered as a promising technique which makes large-scale systems robust to the "straggler" workers. Yet, practical system models for distributed computing have not been available that reflect the clustered or grouped structure of real-world computing servers. Neither the large variations in the computing power and bandwidth capabilities across different servers have been properly modeled. We suggest a group-based model to reflect practical conditions and develop an appropriate coding scheme for this model. The suggested code, called group code, employs parallel encoding for each group. We show that the suggested coding scheme can asymptotically achieve optimal computing time in regimes of infinite n, the number of workers. While theoretical analysis is conducted in the asymptotic regime, numerical results also show that the suggested schemeβ¦
| Code | Decoding | Code |
|---|---|---|
| Complexity | Parameters | |
| MDS | ||
| Product | ||
| Group | ||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques Β· Cooperative Communication and Network Coding Β· Error Correcting Code Techniques
Coded Matrix Multiplication
on a Group-Based Model
Muah Kim
School of Electrical Engineering
*KAIST
*Daejeon, Republic of Korea
ββ
Jy-yong Sohn
School of Electrical Engineering
*KAIST
*Daejeon, Republic of Korea
ββ
Jaekyun Moon
School of Electrical Engineering
*KAIST
*Daejeon, Republic of Korea
Abstract
Coded distributed computing has been considered as a promising technique which makes large-scale systems robust to the βstraggler" workers. Yet, practical system models for distributed computing have not been available that reflect the clustered or grouped structure of real-world computing servers. Neither the large variations in the computing power and bandwidth capabilities across different servers have been properly modeled. We suggest a group-based model to reflect practical conditions and develop an appropriate coding scheme for this model. The suggested code, called group code, employs parallel encoding for each group. We show that the suggested coding scheme can asymptotically achieve optimal computing time in regimes of infinite , the number of workers. While theoretical analysis is conducted in the asymptotic regime, numerical results also show that the suggested scheme achieves near-optimal computing time for any finite but reasonably large . Moreover, we demonstrate that the decoding complexity of the suggested scheme is significantly reduced by the virtue of parallel decoding.
I Introduction
In the era of big data, distributed computing has been recognized as a solution for realizing large-scale machine learning [1]. Unlike conventional centralized systems, a distributed computing system divides the computational work into subtasks and distributes them over multiple nodes. This system successfully supports large-scale machine learning by reducing the computing time via parallel computing.
Yet, there is still a room for improvement as the system is slowed down by the random nature of computing nodes, where certain nodes are inevitably slower than others. In particular, the distributed system is shown to be dramatically degraded by the slowest workers, the βstragglers", whose computational latency is realized by the tail probability [2]. Lee et al. suggested coded computation as a straggler-proof scheme, which speeds up matrix multiplication by employing redundancy with a maximum distance separable (MDS) code [3]. Afterwards, it is shown that coded computation can effectively improve the performance of computing system with regards to: matrix-matrix multiplication [4, 5, 6], distributed gradient descent [7, 8], convolution [9], Fourier transform [10], and matrix sparsification [11, 12]. Moreover, regarding the matrix multiplication, new models reflecting the practical environment of computing systems such as the tree structure and heterogeneity are suggested and analyzed [13, 14].
In recent years, distributed cloud computing services such as Amazon EC2 enable customers to deal with large-scale computation [15]. The real distributed computing systems generally adopt the multi-rack structure, where the computing workers are grouped together in multiple racks [16, 17, 18]. Moreover, in the real world, the workersβ latency statistics are heterogeneous due to a mixed use of hardwares with varying performances or the dynamics of multiple user requests over shared resources [19]. So far, the homogeneous grouped structure has been considered in [14], and the heterogeneous workers without grouped feature has been studied in [13]. However, system solutions which reflect both of the two practical conditionsgrouped structure and heterogeneity (in terms of number of workers in each group as well as the bandwidth of the communication links associated with the groups)are yet to be established.
I-A Main Contributions
We design a group-based computing model as shown in Fig. 1, where workers are dispersed into groups, each having a different number of nodes and distinct computing time statistics. We assume that group has nodes, each of which has a computing time given by an exponential random variable with rate . This is a more practical model than the existing ones because it resembles the tree-shaped (grouped) distributed computing systems such as the Hadoop file system while also considering the heterogeneity of the groups.
Considering the scenario of computing tasks in the suggested model, we show that an MDS code achieves the optimal computing time. Yet, this scheme requires a prohibitive decoding complexity as increases. In addition, it is hard to obtain a closed-form expression for the optimal computing time due to the heterogeneous nature of the model.
To address these issues, we propose a coding scheme called group code which divides the total tasks into partitions and then employs distinct MDS codes. We show that a carefully designed group code can asymptotically achieve the optimal computing time as goes to infinity. In addition, the suggested group code can reduce the decoding complexity down to a factor of compared to an MDS code, where . Furthermore, we obtain a closed-form expression for the expected optimal computing time, when the number of workers goes to infinity.
I-B Related Works
Previous works on coded computation either achieves the optimal computing time with a prohibitive decoding complexity, or reduce the decoding complexity at the sacrifice of the optimality in computing time. In addition, most of them assume homogeneous workers. Applying an MDS code in homogeneous systems is suggested by [3], which achieves the optimal computing time but requires a huge decoding complexity as increases. Considering a system model with heterogeneous workers, the authors of [13] suggested a coding scheme which achieves an asymptotically optimal computing time. However, the decoding process requires the computational complexity of . Moreover, the coding schemes suggested in [4, 14, 6] encode the tasks along multiple dimensions, which can effectively reduce the decoding complexities by the virtue of parallel decoding or a peeling decoding scheme. However, these codes lose the MDS property and thereby cannot achieve the optimal computing time. Besides, these codes do not provide solutions for practical systems with heterogeneous groups. Compared to these existing works, our suggested scheme is shown to not only asymptotically achieve the optimal computing time, but also requires a low decoding complexity.
I-C Notations
Here, we list mathematical notations used in this paper. For a positive integer , a set of positive integers less than or equal to is denoted by . For a matrix with multiple rows, represents row-wise division of , i.e. . We use to denote an group code and to denote an MDS code. The definition of group code is in Section II-A. We denote the floor, ceil and round functions of a real value by and .
II System Model and Target Problem
II-A System Model
Consider the workers that are spread into groups as shown in Fig. 1. Here, group has workers whose response times are described by i.i.d. random variables with a parameter of . We define this system as an group system, where and . For simplicity, we call the worker in group as for and . We implement a matrix-vector multiplication on this system, where is a work matrix and is an input vector for some positive integers and . Now, the work matrix is divided into equal-sized submatrices as , where is a positive integer that can divide , and for .
The task of computing is distributed to workers as below. First, we define as a task allocation vector, where the elements are positive integers satisfying . The set of submatrices is now partitioned into disjoint subsets such that holds for . We denote the elements in set as Afterwards, the elements of are encoded with an MDS code and we denote the set of coded submatrices by . Worker now stores and computes when it receives the input vector from the master. We call this coding scheme as an group code, denoted by . Fig. 2 illustrates an example of an group code when and . The matrix is divided into two sets of submatrices, and . Then, by applying a MDS code and a MDS code, respectively, we obtain and . Each worker individually transmits its computational result to the master when its computation is finished. To obtain the computational output , the master needs at least computational results from each group to decode the MDS code. Note that this model can be directly applied to the matrix-matrix multiplication, where the input vector is replaced by a matrix .
We adopt the exponential distribution model for the completion time of a worker, which is defined as the time taken for both the computation and the transmission of the computed result to the master. This model has also been assumed in other papers on coded computation [4, 14]. Unlike these papers, however, a worker in group has the distribution parameter of , where varies among different groups. More precisely, the completion time of worker is defined by its cumulative distribution function as for time . Here, the completion time has the rate of since the number of rows in the submatrix becomes smaller as increases.
II-B Target Problem
This paper mainly aims at analyzing the total execution time of group codes, which refers to the entire time taken for computing and decoding. The computing time is the time taken for the master to gather computational subtasks from the workers, while the decoding time is the time taken to recover the original task of computing from the gathered subtasks. In this paper, we assume that the encoding time complexity is negligible compared to and . This is because we focus on the scenarios of multiplying varying input vectors with the same work matrix , which is encoded once prior to the computation. Thus, we have
[TABLE]
when code is applied to the system.
We focus on analyzing the computing time of group codes, which is denoted by . Recall that the computing time of an group code is equivalent to the time when every group has at least workers which finish their tasks. Let be the smallest value among . Then, can be expressed as
[TABLE]
Since it is hard to find a closed-form expression for when is finite, we set our main problem as to obtain the expected value as goes to infinity, i.e.
[TABLE]
Here, we assume and for .
III Optimal Computing Time Analysis
Here we find the optimal computing time of a given group system. Theorem 1 states that applying an MDS code achieves the optimal computing time. We consider an MDS code is applied to the submatrices , resulting in coded submatrices . Then, the coded submatrices are distributed to workers regardless of the groups they belong. Here we denote the computing time of an MDS code as .
Theorem 1**.**
Consider computing tasks on group systems. Then, an MDS code achieves the optimal computing time. In other words, for arbitrary linear code ,
[TABLE]
Proof.
Given an arbitrary realization of the completion times of workers, we can think of their order statistics . Recall that linear code cannot recover the original message if there are more than erasures, which leads to . By the MDS property, we have , which completes the proof. β
IV Computing Time Analysis
In this section, we provide the computing time analysis when the workers are dispersed into groups.
IV-A Computing Time for an Arbitrary Task Allocation
For simplicity, we denote the task allocation vector as . The computing time of an group code for can be expressed as by definition. Lemma 1 provides the expected computing time of an group code when goes to infinity.
Lemma 1**.**
Consider an group system with groups. Then, the expected computing time of an group code satisfies the following:
[TABLE]
Proof.
We set aside the proof at Appendix A. β
This lemma illustrates that in the asymptotic regime of large , the expected computing time of an group code can be easily obtained for given , and .
Now, we aim at optimizing task allocation rule which minimizes the computing time of an group code. We define the optimal task allocation vector by
[TABLE]
whose elements are denoted by . Before finding , we state a relationship between and in the following Lemma. Recall that is equivalent to , and the maximum among corresponds to by definition.
Lemma 2**.**
Under the scenario of computing tasks on an group system with groups, consider applying an group code where and . Given an arbitrary realization of completion time of workers, let be the smallest value among . Meanwhile, denotes the smallest value among . Then, we have
[TABLE]
Proof.
Let . Consider a subset of set such that and its complementary set . Here, we define . Notice that . Then, we may write . When , we have . Similarly, we have , which leads to . When , we have using the same method as above. For , it is obvious that . This completes the proof.β
In the following theorem, we find the optimal task allocation , and show that the expected computing time of an group code converges to that of an MDS code for sufficiently large .
Theorem 2**.**
Consider a scenario of computing tasks on an group system with groups, where an group code is applied. In the asymptotic regime of large , the optimal task allocation can be obtained111Here we assume that is an integer since the task allocation vector consists of integers. However, in case of not an integer, the optimal allocation rule is either or , since is a convex function of , as in the proof. by solving
[TABLE]
Moreover, the expected computing time of an group code satisfies the following:
[TABLE]
Proof.
Combining (1) and (2), we obtain
[TABLE]
Note that the first variable of the max function is a strictly increasing convex function of , while the second one is a strictly decreasing convex function. Thus, taking the maximum of the two variables results in a convex function of . Therefore, as grows to infinity, the minimizer coincides with the intersection point of the two functions, i.e.,
[TABLE]
From (1) and (6), we obtain (4) by simple algebraic operations. Now we move on to the proof of (5). First, by taking \raisebox{2.15277pt}{\scalebox{0.8}{\displaystyle\lim_{n\to\infty};}}\mathbb{E}[\cdot] on (3) and applying Lemma 1, we obtain
[TABLE]
When , the upper and lower bounds have the same value as in (6). Thus, by squeeze theorem, we have
[TABLE]
Therefore, we obtain (5) by using and T_{\mathrm{comp}}(C_{\mathrm{G}}(\bm{n},\bm{k}))=\raisebox{2.15277pt}{\scalebox{0.8}{\displaystyle\max_{i\in[L]};}}T_{k_{i}:n_{i}}^{(i)}. β
Recall that an MDS code achieves the optimal computing time as stated in Theorem 1. The above theorem implies that an group coded system can asymptotically achieve the optimal computing time by using the optimal task allocation rule . Note that (4) can be easily solved when by using the quadratic formula. The following corollary provides the optimal task allocation and the corresponding when .
Corollary 1**.**
Consider the scenario of computing tasks on an group system with and . Under the scenario of applying an group code on this system, the optimal task allocation is obtained as
[TABLE]
Moreover, the expected value of the corresponding computing time can be calculated as
[TABLE]
Proof.
When , the equation (4) reduces to (7). In addition, inserting (7) into (1) results in (8). β
IV-B Numerical Results when the Number of Nodes are Finite
Here, we provide simulation results on the computing time of an group code when the number of nodes is finite. Fig. 3 illustrates the expected computing time of an MDS code and that of group code , for various . We consider two types of group codes: one with the optimal task allocation , and the other with an even task allocation . For a fixed number of tasks , we assume that workers are divided into two groups as . Moreover, the average computing time of a worker doubles in the first group, i.e., . For the estimation, we employ Monte Carlo methods with random samples. The simulation result demonstrates that the expected computing time of an group code approaches to that of an MDS code in the asymptotic regime of large , as proved in Theorem 2. Moreover, the average computing times of two group codes the optimal group code and a naive group code have a significant gap, which supports the necessity of a careful task allocation considering the heterogeneity of groups.
V Computing Time Analysis for General
V-A Computing Time for an Arbitrary Task Allocation
This section provides the expected computing time of an group code for an arbitrary number of groups, i.e. . The following lemma provides a numerical way to obtain as grows to infinity.
Lemma 3**.**
Consider an group system with groups. Then, the expected computing time of an group code satisfies the following:
[TABLE]
Proof.
The proof is located at Appendix B. β
This lemma signals that the expected computing time of an group code can be easily obtained when and are given.
V-B Optimizing Task Allocation
In this subsection, we present the optimal task allocation rule for given parameters and . Before optimizing the task allocation vector , we provide a relationship between order statistics and .
Lemma 4**.**
Under the scenario of computing tasks on an group system with groups, consider applying an group code where and . Given an arbitrary realization of completion time of workers, let be the smallest value among . Meanwhile, denotes the smallest value among . Then, we have
[TABLE]
Proof.
The proof can be found at Appendix C β
Here we recall that is the computing time of an MDS code and the upper bound \raisebox{2.15277pt}{\scalebox{0.8}{\displaystyle\max_{i\in[L]};}}T^{(i)}_{k_{i}:n_{i}} is the computing time of an group code. Now, Theorem 3 specifies the optimal task allocation defined as (2) when there are groups. Moreover, the computing time of an group code and an MDS code is compared.
Theorem 3**.**
Consider the scenario of computing tasks on an group system with groups, where an group code is applied. In the asymptotic regime of large , the optimal task allocation can be obtained222Here we assume that is an integer for since task allocation vector consists of integer values. However, in the case of not an integer, we can use the round function to set . For reasonably large and , this rounding function has a negligible impact on the overall performance. by solving the following equations for :
[TABLE]
Moreover, the corresponding expected computing time of an group code is equal to that of an MDS code as goes to infinity, i.e.
[TABLE]
Proof.
We prove this theorem at Appendix D. β
Recall that an MDS code is optimal in terms of computing time. The above theorem illustrates that an group code can asymptotically achieve the optimal computing time when the tasks are optimally allocated, i.e. .
VI Decoding Time Analysis
Now we compare the decoding complexity of the suggested group code to that of an MDS code. We assume that the decoding complexity of an MDS code is for 333According to the recent works [20, 21] on decoding algorithms, practical scenarios satisfy .. Then, the suggested group code has a decoding complexity of by the virtue of parallel decoding, where k_{\mathrm{max}}=\raisebox{2.15277pt}{\scalebox{0.8}{\displaystyle\max_{i\in[L]};}}k_{i}. Note that decoding complexities of two schemes grow with different orders. For a comparison, we define the ratio of the two orders as
[TABLE]
Note that the ratio can be minimized down to when we have .
Fig. 4 illustrates under two different scenarios for given and . In both scenarios, and are randomly generated. Moreover, the task allocations for both scenarios are selected as the optimal , depending on the given parameters of and . Motivated by the practical setting where the size of each group and the average computing time of each worker are bounded, we set and with uniform distributions. Scenarios 1 and 2 differ in the rule of ordering the elements of and , as illustrated below. For scenario 1, we sort the elements of and in ascending and descending order, respectively. In other words, and hold for all . This is the scenario when a group with less average response time has less workers. In the case of scenario 2, both and are sorted in ascending order, i.e., and hold for . This is the scenario when a group with less average response time has more workers. Under these scenarios, we obtain the average values of for samples when . The simulations on two scenarios are compared to the minimum achievable . Moreover, we plotted the trend line, which is set to stretch from the point of Scenario 2 for and grow by a factor of .
Fig. 4 delineates that diminishes along with the trend line under any scenarios as grows. Combining this with the definition of , we can remark that is inversely proportional to in practical scenarios. Moreover, the proposed group code provides a significant decoding complexity reduction in both scenarios. For example, when , an group code already achieves roughly 10x reduced decoding complexity compared to an MDS code.
Now we compare the total execution time of the suggested group code to existing schemes by using a simulation. We represent the total execution time as , where the coefficient indicates a relative weight of the decoding complexity compared to the computing time. We simulate the computing of tasks on an group system with and , which leads to with groups. For varying , we observe the execution times of the MDS code, the product code, and the suggested group code with parameters listed on Table I. The decoding complexity of the product code is because the decoding procedure consists of decoding MDS codes, where the dimension of each MDS code is . For the group code, we use the optimal task allocation rule . For the decoding complexity, we use a parameter of .
Fig. 5(a) and Fig. 5(b) show the simulated execution times for different regimes of . Fig. 5(a) illustrates the situation where the computing time is dominant, i.e. is small. When is the lowest in Fig. 5(a), the MDS code gives the smallest execution time, followed by the group code and then the product code. This coincides with the two mathematical results shown above: the optimality of the MDS code in Lemma 1 and the asymptotic optimality of the group code in Theorem 3. Note that the coding scheme that gives the best execution time changes as varies. Meanwhile, Fig. 5(b) represents the situation where the decoding complexity dominates the execution time. Notice that the execution time of the MDS code becomes inferior to other schemes due to its huge decoding complexity as grows. On this computing system, the group code gives the best execution time for all regime of . In general, the order of determines which of the group code or the product code has a better decoding complexity. Recall that the decoding complexity of the group code and the product code are and , respectively. Thus, we can say that the decoding complexity of the group code is better than the product code when
[TABLE]
holds. Remind that is inversely proportional to under practical scenarios as shown in Fig. 4. Thus, the condition in (11) reduces to . This implies that when a system has sufficiently large number of groups, the group code outperforms the product code in terms of the decoding complexity.
VII Conclusion
In this paper, we propose a coded computation scheme appropriate for a practical model, which reflects the tree-shaped structure and the heterogeneity of groups. Precisely, we consider systems with heterogeneous groups that have distinct computing time statistics and a different number of workers. We prove that the suggested group-coded scheme can asymptotically achieve the optimal computing time as grows to infinity. In the regime of finite , numerical results show that the suggested scheme also provides a near-optimal computing time. Moreover, the suggested scheme can reduce the decoding complexity down to a factor of , where , compared to the existing MDS coded scheme. Finally, the total execution timethe sum of the computing time and the decoding timeof the suggested scheme is numerically shown to outperform other existing state-of-the-art coding schemes.
Appendix A Proof of Lemma 1
We first show that is determined as one of and for sufficiently large , and thereby the expected value of is determined as the maximum among the expected values of and .
First, consider the order statistic of i.i.d. random variables , whose probability distribution function (PDF) and cumulative distribution function (CDF) are denoted by and . We represent an empirical CDF obtained with samples as . According to [22], can be represented as
[TABLE]
where and the third term satisfies In [22], it is shown that , where . Thus, we have
Now, we examine the convergence of by using (12). Let and be the PDF and CDF of an exponential random variable with rate , and define as for , i.e.
[TABLE]
Then, we can think of the asymptotic distribution of the as follows:
[TABLE]
where for . By the definition of convergence in distribution, for any , we have
[TABLE]
Then, the convergence of into can be derived as follows.
[TABLE]
This means converges in probability towards the constant as , i.e.
[TABLE]
It illustrates that for sufficiently large , the order of two independent order statistics is maintained corresponding to their mean values due to the convergence. Consequently, the sign of loses randomness and is determined in asymptotic regime of large . Therefore, we can claim that
[TABLE]
This equation indicates that in asymptotic regime of large , the random variable , which has cumbersome distribution, can be substituted with , which is a binary number that can be easily calculated.
Now we prove the statement of Lemma 1 by using (15) as follows.
[TABLE]
Equality holds since limit and expectation can be interchanged when the random variable is non-negative, which is satisfied because . Equality holds by (15). Note that this proof can be directly applied to the min function of two independent order statistics instead of max function.
Appendix B Proof of Lemma 3
We prove the statement by using the mathematical induction. For the base step, we already prove the statement for in Lemma 1. Now, we show if the statement is true for an arbitrary , then the statement still holds for . Before moving onto the proof, we provide the convergence of max function, which is necessary for the proof. Recall that equation (15) shows the order of two independent order statistics is determined by their expectation values for sufficiently large . Thus, we can claim for arbitrary , the following statement is true.
[TABLE]
This leads to
[TABLE]
where In other words, the maximum of independent order statistics is determined as the one that has the largest expectation value for sufficiently large .
We here move on to the inductive step, assuming the statement holds for as
[TABLE]
Now, we examine the statement holds for as well:
[TABLE]
Equality holds since becomes the one whose expectation value is the largest for sufficiently large as shown in (16). We can lead to equality by Lemma 1 since it is equivalent to the case when . Equality holds by the assumption (17). Thus, we have
[TABLE]
which completes the whole proof of this lemma. Similarly, we can show
[TABLE]
Appendix C Proof of Lemma 4
Imagine there are three groups. Then, for arbitrary realization of , the following inequalities hold by Lemma 2.
[TABLE]
We can change the lower bound by using an apparent inequality to have
[TABLE]
Thus, we have
[TABLE]
We can also prove the statement for an arbitrary by repeating this process. Thus, we have
[TABLE]
Appendix D Proof of Theorem 3
We first prove that the best task allocation rule satisfies that the following equations:
[TABLE]
Then, we show the an group code achieves the same computing time as an MDS code in an asymptotic region of large . Afterwards, we provide the proof of the existence and the uniqueness of .
First, we rewrite the the statement (9) of Lemma 3 w.r.t. \raisebox{2.15277pt}{\scalebox{0.8}{\displaystyle\lim_{n\to\infty};}}\mathbb{E}[T_{k_{j}:n_{j}}^{(j)}] for as follows:
[TABLE]
Note that the first variable of the max function is a strictly increasing convex function with , whereas the second variable \raisebox{2.15277pt}{\scalebox{0.8}{\displaystyle\max_{i\neq j};}}(\raisebox{2.15277pt}{\scalebox{0.8}{\displaystyle\lim_{n\to\infty};}}\mathbb{E}[T_{k_{i}:n_{i}}^{(i)}]) is a strictly deceasing convex function with because it is equivalent to the time for computing tasks by using a group code with groups by (9). Hence, taking max of the two variables results in a convex function that has the minimum value at the intersection of the two variables. Hence, the optimal value of satisfies
[TABLE]
We may write as below:
[TABLE]
To satisfy the above inequality for all and , the optimal tast allocation must satisfy the equation (18).
Next, we consider the following bounds, which obtained by taking \raisebox{2.15277pt}{\scalebox{0.8}{\displaystyle\lim_{n\to\infty};}}\mathbb{E}[\cdot] of the bounds suggested in Lemma 4 and applying Lemma 3:
[TABLE]
For , the above lower and upper bounds have an equal value by (18). Hence, \raisebox{2.15277pt}{\scalebox{0.8}{\displaystyle\lim_{n\to\infty};}}\mathbb{E}[T_{k:n}] and \raisebox{2.15277pt}{\scalebox{0.8}{\displaystyle\max_{i\in[L]};}}(\raisebox{2.15277pt}{\scalebox{0.8}{\displaystyle\lim_{n\to\infty};}}\mathbb{E}[T_{k_{i}^{*}:n_{i}}^{(i)}]) have the same value, which correspond to \raisebox{2.15277pt}{\scalebox{0.8}{\displaystyle\lim_{n\to\infty};}}\mathbb{E}[T_{\mathrm{comp}}(C_{\mathrm{MDS}}(n,k))] and \raisebox{2.15277pt}{\scalebox{0.8}{\displaystyle\lim_{n\to\infty};}}\mathbb{E}[T_{\mathrm{comp}}(C_{\mathrm{G}}(\bm{n},\bm{k}^{*}))] respectively. Thus, we prove
[TABLE]
Lastly, we move on to the proof of the existence and the uniqueness of . Remark that the interval of is confined as due to the conditions and . By inserting equation (13) to (18), the following equation is obtained for :
[TABLE]
Thus, we may write the following equation which consists of a single variable .
[TABLE]
For simplicity, we denote the right-hand side by . Note that is a strictly increasing function with . thus we can complete the proof if we show starts from a value lower than and reaches to another value greater than in the given interval. Firstly, when the lower bound is [math], it is obvious that . The other case, when , is also easily proved as,
[TABLE]
Similarly, when the upper bound is , one can easily show that . The other case of , i.e. , also satisfies as follows.
[TABLE]
We complete the proof by showing that for the lower bound and for the upper bound , which guarantees the existence of the one intersection between a strictly increasing function and a constant function with .
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, A. Senior, P. Tucker, K. Yang, Q. V. Le et al. , βLarge scale distributed deep networks,β in Advances in neural information processing systems , 2012, pp. 1223β1231.
- 2[2] J. Dean and L. A. Barroso, βThe tail at scale,β Communications of the ACM , vol. 56, no. 2, pp. 74β80, 2013.
- 3[3] K. Lee, M. Lam, R. Pedarsani, D. Papailiopoulos, and K. Ramchandran, βSpeeding up distributed machine learning using codes,β IEEE Transactions on Information Theory , vol. 64, no. 3, pp. 1514β1529, 2018.
- 4[4] K. Lee, C. Suh, and K. Ramchandran, βHigh-dimensional coded matrix multiplication,β in Information Theory (ISIT), 2017 IEEE International Symposium on . IEEE, 2017, pp. 2418β2422.
- 5[5] Q. Yu, M. Maddah-Ali, and S. Avestimehr, βPolynomial codes: an optimal design for high-dimensional coded matrix multiplication,β in Advances in Neural Information Processing Systems , 2017, pp. 4403β4413.
- 6[6] T. Baharav, K. Lee, O. Ocal, and K. Ramchandran, βStraggler-proofing massive-scale distributed matrix multiplication with d-dimensional product codes,β 2018.
- 7[7] N. Raviv, I. Tamo, R. Tandon, and A. G. Dimakis, βGradient coding from cyclic mds codes and expander graphs,β ar Xiv preprint ar Xiv:1707.03858 , 2017.
- 8[8] R. Tandon, Q. Lei, A. G. Dimakis, and N. Karampatziakis, βGradient coding: Avoiding stragglers in distributed learning,β in International Conference on Machine Learning , 2017, pp. 3368β3376.
