Coded Matrix Multiplication on a Group-Based Model

Muah Kim; Jy-yong Sohn; Jaekyun Moon

arXiv:1901.05162·cs.IT·January 17, 2019

Coded Matrix Multiplication on a Group-Based Model

Muah Kim, Jy-yong Sohn, Jaekyun Moon

PDF

Open Access

TL;DR

This paper introduces a group-based coding scheme for distributed matrix multiplication that models real-world server clusters, achieving near-optimal performance and reduced decoding complexity.

Contribution

It proposes a novel group code tailored for clustered distributed systems, reflecting practical conditions and improving decoding efficiency.

Findings

01

Achieves asymptotic optimality in large-scale regimes.

02

Demonstrates near-optimal performance for finite system sizes.

03

Reduces decoding complexity through parallel decoding.

Abstract

Coded distributed computing has been considered as a promising technique which makes large-scale systems robust to the "straggler" workers. Yet, practical system models for distributed computing have not been available that reflect the clustered or grouped structure of real-world computing servers. Neither the large variations in the computing power and bandwidth capabilities across different servers have been properly modeled. We suggest a group-based model to reflect practical conditions and develop an appropriate coding scheme for this model. The suggested code, called group code, employs parallel encoding for each group. We show that the suggested coding scheme can asymptotically achieve optimal computing time in regimes of infinite n, the number of workers. While theoretical analysis is conducted in the asymptotic regime, numerical results also show that the suggested scheme…

Tables1

Table 1. Table I: Code parameters and decoding complexities of various coding schemes used for the simulation.

Code	Decoding	Code
Code	Complexity	Parameters
MDS	$𝒪 (k^{β})$	$(n, k) = (900, 400)$
Product	$𝒪 ({(\sqrt{k})}^{β + 1})$	${(\sqrt{n}, \sqrt{k})}^{2} = {(30, 20)}^{2}$
Group	$𝒪 (k_{\max}^{β})$	$𝒏 = [180, 170, 160, 140, 130, 120]$
Group	$𝒪 (k_{\max}^{β})$	$𝒌 = 𝒌^{*} = [71, 71, 70, 65, 63, 60]$

Equations135

T_{exec} (C) = T_{comp} (C) + T_{dec} (C),

T_{exec} (C) = T_{comp} (C) + T_{dec} (C),

T_{comp} (C_{G} (n, k)) = max (T_{k_{1} : n_{1}}^{(1)}, T_{k_{2} : n_{2}}^{(2)}, \dots, T_{k_{L} : n_{L}}^{(L)}) .

T_{comp} (C_{G} (n, k)) = max (T_{k_{1} : n_{1}}^{(1)}, T_{k_{2} : n_{2}}^{(2)}, \dots, T_{k_{L} : n_{L}}^{(L)}) .

P_{main} : compute n \to \infty lim E [T_{comp} (C_{G} (n, k))] .

P_{main} : compute n \to \infty lim E [T_{comp} (C_{G} (n, k))] .

T_{comp} (C_{MDS} (n, k)) \leq T_{comp} (C) .

T_{comp} (C_{MDS} (n, k)) \leq T_{comp} (C) .

n \to \infty lim

n \to \infty lim

=

=

k^{*} : = k ar g min E [T_{comp} (C_{G} (n, k))],

k^{*} : = k ar g min E [T_{comp} (C_{G} (n, k))],

min (T_{k_{1} : n_{1}}^{(1)}, T_{k - k_{1} : n_{2}}^{(2)}) \leq T_{k : n} \leq max (T_{k_{1} : n_{1}}^{(1)}, T_{k - k_{1} : n_{2}}^{(2)}) .

min (T_{k_{1} : n_{1}}^{(1)}, T_{k - k_{1} : n_{2}}^{(2)}) \leq T_{k : n} \leq max (T_{k_{1} : n_{1}}^{(1)}, T_{k - k_{1} : n_{2}}^{(2)}) .

k_{1}^{*} + n_{2} - n_{2} (1 - \frac{k _{1}^{*}}{n _{1}})^{\frac{μ _{2}}{μ _{1}}} = k .

k_{1}^{*} + n_{2} - n_{2} (1 - \frac{k _{1}^{*}}{n _{1}})^{\frac{μ _{2}}{μ _{1}}} = k .

n \to \infty lim E [T_{comp} (C_{G} (n, k^{*}))] = n \to \infty lim E [T_{comp} (C_{MDS} (n, k))] .

n \to \infty lim E [T_{comp} (C_{G} (n, k^{*}))] = n \to \infty lim E [T_{comp} (C_{MDS} (n, k))] .

n \to \infty lim k_{1}^{*}

n \to \infty lim k_{1}^{*}

\displaystyle=\underset{k_{1}\in[k]}{\arg\min}\Big{\{}\max\Big{(}-\dfrac{1}{k\mu_{1}}\log(1-\dfrac{k_{1}}{n_{1}}),

\displaystyle\hskip 85.35826pt-\dfrac{1}{k\mu_{2}}\log(1-\dfrac{k-k_{1}}{n_{2}})\Big{)}\Big{\}}.

n \to \infty lim E [T_{k_{1}^{*} : n_{1}}^{(1)}] = n \to \infty lim E [T_{k - k_{1}^{*} : n_{2}}^{(2)}] .

n \to \infty lim E [T_{k_{1}^{*} : n_{1}}^{(1)}] = n \to \infty lim E [T_{k - k_{1}^{*} : n_{2}}^{(2)}] .

\displaystyle\min\Big{(}\lim_{n\to\infty}\mathbb{E}[T_{k_{1}:n_{1}}^{(1)}]

\displaystyle\min\Big{(}\lim_{n\to\infty}\mathbb{E}[T_{k_{1}:n_{1}}^{(1)}]

\leq max

n \to \infty lim E [T_{k : n}] = n \to \infty lim E [T_{k_{1}^{*} : n_{1}}^{(1)}] = n \to \infty lim E [T_{k - k_{1}^{*} : n_{2}}^{(2)}] .

n \to \infty lim E [T_{k : n}] = n \to \infty lim E [T_{k_{1}^{*} : n_{1}}^{(1)}] = n \to \infty lim E [T_{k - k_{1}^{*} : n_{2}}^{(2)}] .

k_{1}^{*} = k - n_{2} - \frac{n _{2}^{2}}{2 n _{1}} + (n_{2} + \frac{n _{2}^{2}}{2 n _{1}})^{2} - \frac{k}{n _{1}} n_{2}^{2} .

k_{1}^{*} = k - n_{2} - \frac{n _{2}^{2}}{2 n _{1}} + (n_{2} + \frac{n _{2}^{2}}{2 n _{1}})^{2} - \frac{k}{n _{1}} n_{2}^{2} .

n \to \infty lim

n \to \infty lim

= \frac{1}{k μ _{2}} lo g ((1 + \frac{n _{2}}{2 n _{1}})^{2} - \frac{k}{n _{1}} - \frac{n _{2}}{2 n _{1}})^{- 1} .

n \to \infty lim E [T_{comp} (C_{G} (n, k))]

n \to \infty lim E [T_{comp} (C_{G} (n, k))]

= i \in [L] max

= i \in [L] max

i \in [L] min T_{k_{i} : n_{i}}^{(i)} \leq T_{k : n} \leq i \in [L] max T_{k_{i} : n_{i}}^{(i)} .

i \in [L] min T_{k_{i} : n_{i}}^{(i)} \leq T_{k : n} \leq i \in [L] max T_{k_{i} : n_{i}}^{(i)} .

k_{i}^{*} + j \neq = i \sum n_{j} (1 - (1 - \frac{k _{i}^{*}}{n _{i}})^{\frac{μ _{j}}{μ _{i}}}) = k .

k_{i}^{*} + j \neq = i \sum n_{j} (1 - (1 - \frac{k _{i}^{*}}{n _{i}})^{\frac{μ _{j}}{μ _{i}}}) = k .

n \to \infty lim E [T_{comp} (C_{G} (n, k^{*}))] = n \to \infty lim E [T_{comp} (C_{MDS} (n, k))] .

n \to \infty lim E [T_{comp} (C_{G} (n, k^{*}))] = n \to \infty lim E [T_{comp} (C_{MDS} (n, k))] .

ρ_{dec} = (\frac{k _{max}}{k})^{β} .

ρ_{dec} = (\frac{k _{max}}{k})^{β} .

k_{max} = O ((k)^{1 + \frac{1}{β}})

k_{max} = O ((k)^{1 + \frac{1}{β}})

T_{k : n} = ξ - \frac{F _{n} ( ξ ) - k / n}{f ( ξ )} + R_{n},

T_{k : n} = ξ - \frac{F _{n} ( ξ ) - k / n}{f ( ξ )} + R_{n},

ξ^{(i)} = F_{(i)}^{- 1} (k_{i} / n_{i}) = - \frac{1}{k μ _{i}} lo g (1 - \frac{k _{i}}{n _{i}})

ξ^{(i)} = F_{(i)}^{- 1} (k_{i} / n_{i}) = - \frac{1}{k μ _{i}} lo g (1 - \frac{k _{i}}{n _{i}})

T_{k_{1} : n_{1}}^{(1)} - T_{k_{2} : n_{2}}^{(2)} - (ξ^{(1)} - ξ^{(2)})

T_{k_{1} : n_{1}}^{(1)} - T_{k_{2} : n_{2}}^{(2)} - (ξ^{(1)} - ξ^{(2)})

=

n \to \infty lim Pr {T_{k_{1} : n_{1}}^{(1)} - T_{k_{2} : n_{2}}^{(2)} - (ξ^{(1)} - ξ^{(2)}) \leq ϵ} = n \to \infty lim Φ (ϵ \frac{n}{V}) .

n \to \infty lim Pr {T_{k_{1} : n_{1}}^{(1)} - T_{k_{2} : n_{2}}^{(2)} - (ξ^{(1)} - ξ^{(2)}) \leq ϵ} = n \to \infty lim Φ (ϵ \frac{n}{V}) .

n \to \infty lim Pr (∣ T_{k_{1} : n_{1}}^{(1)} - T_{k_{2} : n_{2}}^{(2)} - (ξ^{(1)} - ξ^{(2)})∣ \geq ϵ)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Cooperative Communication and Network Coding · Error Correcting Code Techniques

Full text

Coded Matrix Multiplication

on a Group-Based Model

Muah Kim

School of Electrical Engineering

*KAIST

*Daejeon, Republic of Korea

[email protected]

Jy-yong Sohn

School of Electrical Engineering

*KAIST

*Daejeon, Republic of Korea

[email protected]

Jaekyun Moon

School of Electrical Engineering

*KAIST

*Daejeon, Republic of Korea

[email protected]

Abstract

Coded distributed computing has been considered as a promising technique which makes large-scale systems robust to the “straggler" workers. Yet, practical system models for distributed computing have not been available that reflect the clustered or grouped structure of real-world computing servers. Neither the large variations in the computing power and bandwidth capabilities across different servers have been properly modeled. We suggest a group-based model to reflect practical conditions and develop an appropriate coding scheme for this model. The suggested code, called group code, employs parallel encoding for each group. We show that the suggested coding scheme can asymptotically achieve optimal computing time in regimes of infinite $n$ , the number of workers. While theoretical analysis is conducted in the asymptotic regime, numerical results also show that the suggested scheme achieves near-optimal computing time for any finite but reasonably large $n$ . Moreover, we demonstrate that the decoding complexity of the suggested scheme is significantly reduced by the virtue of parallel decoding.

I Introduction

In the era of big data, distributed computing has been recognized as a solution for realizing large-scale machine learning [1]. Unlike conventional centralized systems, a distributed computing system divides the computational work into subtasks and distributes them over multiple nodes. This system successfully supports large-scale machine learning by reducing the computing time via parallel computing.

Yet, there is still a room for improvement as the system is slowed down by the random nature of computing nodes, where certain nodes are inevitably slower than others. In particular, the distributed system is shown to be dramatically degraded by the slowest workers, the “stragglers", whose computational latency is realized by the tail probability [2]. Lee et al. suggested coded computation as a straggler-proof scheme, which speeds up matrix multiplication by employing redundancy with a maximum distance separable (MDS) code [3]. Afterwards, it is shown that coded computation can effectively improve the performance of computing system with regards to: matrix-matrix multiplication [4, 5, 6], distributed gradient descent [7, 8], convolution [9], Fourier transform [10], and matrix sparsification [11, 12]. Moreover, regarding the matrix multiplication, new models reflecting the practical environment of computing systems such as the tree structure and heterogeneity are suggested and analyzed [13, 14].

In recent years, distributed cloud computing services such as Amazon EC2 enable customers to deal with large-scale computation [15]. The real distributed computing systems generally adopt the multi-rack structure, where the computing workers are grouped together in multiple racks [16, 17, 18]. Moreover, in the real world, the workers’ latency statistics are heterogeneous due to a mixed use of hardwares with varying performances or the dynamics of multiple user requests over shared resources [19]. So far, the homogeneous grouped structure has been considered in [14], and the heterogeneous workers without grouped feature has been studied in [13]. However, system solutions which reflect both of the two practical conditions $-$ grouped structure and heterogeneity (in terms of number of workers in each group as well as the bandwidth of the communication links associated with the groups) $-$ are yet to be established.

I-A Main Contributions

We design a group-based computing model as shown in Fig. 1, where $n$ workers are dispersed into $L$ groups, each having a different number of nodes and distinct computing time statistics. We assume that group $i$ has $n_{i}$ nodes, each of which has a computing time given by an exponential random variable with rate $\mu_{i}$ . This is a more practical model than the existing ones because it resembles the tree-shaped (grouped) distributed computing systems such as the Hadoop file system while also considering the heterogeneity of the groups.

Considering the scenario of computing $k$ tasks in the suggested model, we show that an $(n,k)-$ MDS code achieves the optimal computing time. Yet, this scheme requires a prohibitive decoding complexity as $k$ increases. In addition, it is hard to obtain a closed-form expression for the optimal computing time due to the heterogeneous nature of the model.

To address these issues, we propose a coding scheme called group code which divides the total $k$ tasks into $L$ partitions and then employs $L$ distinct MDS codes. We show that a carefully designed group code can asymptotically achieve the optimal computing time as $n$ goes to infinity. In addition, the suggested group code can reduce the decoding complexity down to a factor of $(\frac{1}{L})^{\beta}$ compared to an $(n,k)-$ MDS code, where $\beta>1$ . Furthermore, we obtain a closed-form expression for the expected optimal computing time, when the number of workers $n$ goes to infinity.

I-B Related Works

Previous works on coded computation either achieves the optimal computing time with a prohibitive decoding complexity, or reduce the decoding complexity at the sacrifice of the optimality in computing time. In addition, most of them assume homogeneous workers. Applying an $(n,k)-$ MDS code in homogeneous systems is suggested by [3], which achieves the optimal computing time but requires a huge decoding complexity as $k$ increases. Considering a system model with heterogeneous workers, the authors of [13] suggested a coding scheme which achieves an asymptotically optimal computing time. However, the decoding process requires the computational complexity of $\mathcal{O}(k^{3})$ . Moreover, the coding schemes suggested in [4, 14, 6] encode the tasks along multiple dimensions, which can effectively reduce the decoding complexities by the virtue of parallel decoding or a peeling decoding scheme. However, these codes lose the MDS property and thereby cannot achieve the optimal computing time. Besides, these codes do not provide solutions for practical systems with heterogeneous groups. Compared to these existing works, our suggested scheme is shown to not only asymptotically achieve the optimal computing time, but also requires a low decoding complexity.

I-C Notations

Here, we list mathematical notations used in this paper. For a positive integer $n$ , a set of positive integers less than or equal to $n$ is denoted by $[n]=\{1,2,\dots,n\}$ . For a matrix $\mathbf{A}$ with multiple rows, $\mathbf{A}=[\mathbf{A}_{1};\mathbf{A}_{2}]$ represents row-wise division of $\mathbf{A}$ , i.e. $\mathbf{A}^{T}=[\mathbf{A}_{1}^{T}\mathbf{A}_{2}^{T}]$ . We use $C_{G}(\bm{n},\bm{k})$ to denote an $(\bm{n},\bm{k})-$ group code and $C_{\mathrm{MDS}}(n,k)$ to denote an $(n,k)-$ MDS code. The definition of $(\bm{n},\bm{k})-$ group code is in Section II-A. We denote the floor, ceil and round functions of a real value $x$ by $\lfloor x\rfloor,\lceil x\rceil$ and $\lfloor x\rceil$ .

II System Model and Target Problem

II-A System Model

Consider the $n$ workers that are spread into $L$ groups as shown in Fig. 1. Here, group $i$ has $n_{i}$ workers whose response times are described by i.i.d. random variables with a parameter of $\mu_{i}$ . We define this system as an $(\bm{n},\bm{\mu})-$ group system, where $\bm{n}=[n_{1},n_{2},\dots,n_{L}]$ and $\bm{\mu}=[\mu_{1},\mu_{2},\dots,\mu_{L}]$ . For simplicity, we call the $j^{th}$ worker in group $i$ as $w(i,j)$ for $i\in[L]$ and $j\in[n_{i}]$ . We implement a matrix-vector multiplication $\mathbf{A}\bm{x}$ on this system, where $\mathbf{A}\in\mathbb{R}^{m\times d}$ is a work matrix and $\bm{x}\in\mathbb{R}^{d\times 1}$ is an input vector for some positive integers $m$ and $d$ . Now, the work matrix $\mathbf{A}$ is divided into equal-sized $k$ submatrices as $\mathbf{A}=[\mathbf{A}_{1};\mathbf{A}_{2};\cdots;\mathbf{A}_{k}]$ , where $k$ is a positive integer that can divide $m$ , and $\mathbf{A}_{r}\in\mathbb{R}^{\frac{m}{k}\times d}$ for $r\in[k]$ .

The task of computing $\mathbf{A}\bm{x}$ is distributed to $n$ workers as below. First, we define $\bm{k}=[k_{1},k_{2},\dots,k_{L}]$ as a task allocation vector, where the elements are positive integers satisfying $\sum_{i=1}^{L}k_{i}=k$ . The set of submatrices $\{\mathbf{A}_{r}\}_{r=1}^{k}$ is now partitioned into $L$ disjoint subsets $\{\mathbb{S}_{i}\}_{i=1}^{L}$ such that $\lvert\mathbb{S}_{i}\rvert=k_{i}$ holds for $i\in[L]$ . We denote the elements in set $\mathbb{S}_{i}$ as $\mathbb{S}_{i}=\{\mathbf{A}^{(i)}_{j}\}_{j=1}^{k_{i}}.$ Afterwards, the $k_{i}$ elements of $\mathbb{S}_{i}$ are encoded with an $(n_{i},k_{i})-$ MDS code and we denote the set of $n_{i}$ coded submatrices by $\widetilde{\mathbb{S}}_{i}=\{\widetilde{\mathbf{A}}_{j}^{(i)}\}_{j=1}^{n_{i}}$ . Worker $w(i,j)$ now stores $\widetilde{\mathbf{A}}_{j}^{(i)}$ and computes $\widetilde{\mathbf{A}}_{j}^{(i)}\bm{x}$ when it receives the input vector $\bm{x}$ from the master. We call this coding scheme as an $(\bm{n},\bm{k})-$ group code, denoted by $C_{\mathrm{G}}(\bm{n},\bm{k})$ . Fig. 2 illustrates an example of an $(\bm{n},\bm{k})-$ group code when $\bm{n}=[3,4]$ and $\bm{k}=[2,3]$ . The matrix $\mathbf{A}=[\mathbf{A}_{1};\mathbf{A}_{2};\dots;\mathbf{A}_{5}]$ is divided into two sets of submatrices, $\{\mathbf{A}_{1},\mathbf{A}_{2}\}$ and $\{\mathbf{A}_{3},\mathbf{A}_{4},\mathbf{A}_{5}\}$ . Then, by applying a $(3,2)-$ MDS code and a $(4,3)-$ MDS code, respectively, we obtain $\{\widetilde{\mathbf{A}}_{1}^{(1)},\widetilde{\mathbf{A}}_{2}^{(1)},\widetilde{\mathbf{A}}_{3}^{(1)}\}$ and $\{\widetilde{\mathbf{A}}_{1}^{(2)},\widetilde{\mathbf{A}}_{2}^{(2)},\widetilde{\mathbf{A}}_{3}^{(2)},\widetilde{\mathbf{A}}_{4}^{(2)}\}$ . Each worker individually transmits its computational result $\widetilde{\mathbf{A}}_{j}^{(i)}\bm{x}$ to the master when its computation is finished. To obtain the computational output $\mathbf{A}\bm{x}$ , the master needs at least $k_{i}$ computational results from each group $i$ to decode the $(n_{i},k_{i})-$ MDS code. Note that this model can be directly applied to the matrix-matrix multiplication, where the input vector $\bm{x}$ is replaced by a matrix $\mathbf{B}\in\mathbb{R}^{d\times c}$ .

We adopt the exponential distribution model for the completion time of a worker, which is defined as the time taken for both the computation and the transmission of the computed result to the master. This model has also been assumed in other papers on coded computation [4, 14]. Unlike these papers, however, a worker in group $i$ has the distribution parameter of $\mu_{i}$ , where $\mu_{i}$ varies among different groups. More precisely, the completion time $T_{j}^{(i)}$ of worker $w(i,j)$ is defined by its cumulative distribution function as $\Pr[T_{j}^{(i)}\leq t]=1-e^{k\mu_{i}t}$ for time $t\geq 0$ . Here, the completion time has the rate of $k\mu_{i}$ since the number of rows in the submatrix $\mathbf{A}_{r}\in\mathbb{R}^{\frac{m}{k}\times d}$ becomes smaller as $k$ increases.

II-B Target Problem

This paper mainly aims at analyzing the total execution time $T_{\mathrm{exec}}$ of $(\bm{n},\bm{k})-$ group codes, which refers to the entire time taken for computing and decoding. The computing time $T_{\mathrm{comp}}$ is the time taken for the master to gather computational subtasks from the workers, while the decoding time $T_{\mathrm{dec}}$ is the time taken to recover the original task of computing $\mathbf{A}\bm{x}$ from the gathered subtasks. In this paper, we assume that the encoding time complexity is negligible compared to $T_{\mathrm{comp}}$ and $T_{\mathrm{dec}}$ . This is because we focus on the scenarios of multiplying varying input vectors with the same work matrix $\mathbf{A}$ , which is encoded once prior to the computation. Thus, we have

[TABLE]

when code $C$ is applied to the system.

We focus on analyzing the computing time of $(\bm{n},\bm{k})-$ group codes, which is denoted by $T_{\mathrm{comp}}(C_{\mathrm{G}}(\bm{n},\bm{k}))$ . Recall that the computing time of an $(\bm{n},\bm{k})-$ group code is equivalent to the time when every group $i$ has at least $k_{i}$ workers which finish their tasks. Let $T_{k_{i}:n_{i}}^{(i)}$ be the $k_{i}^{th}$ smallest value among $\{T_{j}^{(i)}\}_{j=1}^{n_{i}}$ . Then, $T_{\mathrm{comp}}(C_{\mathrm{G}}(\bm{n},\bm{k}))$ can be expressed as

[TABLE]

Since it is hard to find a closed-form expression for $\mathbb{E}[T_{\mathrm{comp}}(C_{\mathrm{G}}(\bm{n},\bm{k}))]$ when $n$ is finite, we set our main problem as to obtain the expected value as $n$ goes to infinity, i.e.

[TABLE]

Here, we assume $k=\Theta(n)$ and $n_{i}=\Theta(n)$ for $i\in[L]$ .

III Optimal Computing Time Analysis

Here we find the optimal computing time of a given $(\bm{n},\bm{\mu})-$ group system. Theorem 1 states that applying an $(n,k)-$ MDS code achieves the optimal computing time. We consider an $(n,k)-$ MDS code is applied to the $k$ submatrices $\{\mathbf{A}_{1},\mathbf{A}_{2},\dots,\mathbf{A}_{k}\}$ , resulting in $n$ coded submatrices $\{\widetilde{\mathbf{A}}_{1},\widetilde{\mathbf{A}}_{2},\dots,\widetilde{\mathbf{A}}_{n}\}$ . Then, the $n$ coded submatrices are distributed to $n$ workers regardless of the groups they belong. Here we denote the computing time of an $(n,k)-$ MDS code as $T_{\mathrm{comp}}(C_{\mathrm{MDS}}(n,k))$ .

Theorem 1.

Consider computing $k$ tasks on $(\bm{n},\bm{\mu})-$ group systems. Then, an $(n,k)-$ MDS code achieves the optimal computing time. In other words, for arbitrary $(n,k)$ linear code $C\in\mathcal{C}(n,k)$ ,

[TABLE]

Proof.

Given an arbitrary realization of the completion times $\{T_{j}^{(i)}\}_{i\in[L],j\in[n_{i}]}$ of workers, we can think of their order statistics $T_{1:n}<T_{2:n}<\dots<T_{n:n}$ . Recall that $(n,k)$ linear code $C$ cannot recover the original message if there are more than $n-k$ erasures, which leads to $T_{\mathrm{comp}}(C)\geq T_{k:n}$ . By the MDS property, we have $T_{\mathrm{comp}}(C_{\mathrm{MDS}}(n,k))=T_{k:n}$ , which completes the proof. ∎

IV Computing Time Analysis

In this section, we provide the computing time analysis when the workers are dispersed into $L=2$ groups.

IV-A Computing Time for an Arbitrary Task Allocation $\bm{k}$

For simplicity, we denote the task allocation vector as $\bm{k}=[k_{1},k_{2}]=[k_{1},k-k_{1}]$ . The computing time of an $(\bm{n},\bm{k})-$ group code for $L=2$ can be expressed as $T_{\mathrm{comp}}(C_{\mathrm{G}}(\bm{n},\bm{k}))=\max(T^{(1)}_{k_{1}:n_{1}},T^{(2)}_{k_{2}:n_{2}})$ by definition. Lemma 1 provides the expected computing time of an $(\bm{n},\bm{k})-$ group code when $n$ goes to infinity.

Lemma 1.

Consider an $(\bm{n},\bm{\mu})-$ group system with $L=2$ groups. Then, the expected computing time of an $(\bm{n},\bm{k})-$ group code satisfies the following:

[TABLE]

Proof.

We set aside the proof at Appendix A. ∎

This lemma illustrates that in the asymptotic regime of large $n$ , the expected computing time of an $(\bm{n},\bm{k})-$ group code can be easily obtained for given $\bm{n}$ , $\bm{\mu}$ and $k$ .

Now, we aim at optimizing task allocation rule $\bm{k}$ which minimizes the computing time of an $(\bm{n},\bm{k})-$ group code. We define the optimal task allocation vector by

[TABLE]

whose elements are denoted by $\bm{k}^{*}=[k_{1}^{*},k_{2}^{*},\dots,k_{L}^{*}]$ . Before finding $\bm{k}^{*}$ , we state a relationship between $T_{k:n}$ and $\{T_{k_{i}:n_{i}}^{(i)}\}_{i=1}^{L}$ in the following Lemma. Recall that $T_{k:n}$ is equivalent to $T_{\mathrm{comp}}(C_{\mathrm{MDS}}(n,k))$ , and the maximum among $\{T_{k_{i}:n_{i}}^{(i)}\}_{i=1}^{L}$ corresponds to $T_{\mathrm{comp}}(C_{\mathrm{G}}(\bm{n},\bm{k}))$ by definition.

Lemma 2.

Under the scenario of computing $k$ tasks on an $(\bm{n},\bm{\mu})-$ group system with $L=2$ groups, consider applying an $(\bm{n},\bm{k})-$ group code where $\bm{n}=[n_{1},n_{2}]$ and $\bm{k}=[k_{1},k-k_{1}]$ . Given an arbitrary realization of completion time $\{T_{j}^{(i)}\}_{i\in[2],j\in[n_{i}]}$ of workers, let $T_{k:n}$ be the $k^{th}$ smallest value among $\{T_{j}^{(i)}\}_{i\in[2],j\in[n_{i}]}$ . Meanwhile, $T_{k_{i}:n_{i}}^{(i)}$ denotes the $k_{i}^{th}$ smallest value among $\{T_{j}^{(i)}\}_{j=1}^{n_{i}}$ . Then, we have

[TABLE]

Proof.

Let $\mathbb{U}=\{T_{j}^{(i)}:T_{j}^{(i)}\leq T_{k:n}\mathrm{for}i\in[2],j\in[n_{i}]\}$ . Consider a subset $\mathbb{U}_{1}$ of set $\mathbb{U}$ such that $\mathbb{U}_{1}=\{T_{j}^{(1)}:T_{j}^{(1)}\leq T_{k:n}\mathrm{for}j\in[n_{1}]\}$ and its complementary set $\mathbb{U}_{1}^{C}=\{T_{j}^{(2)}:T_{j}^{(2)}\leq T_{k:n}\mathrm{for}j\in[n_{2}]\}$ . Here, we define $k_{1}^{\prime}\coloneqq\lvert\mathbb{U}_{1}\rvert$ . Notice that $\lvert\mathbb{U}_{1}^{C}\rvert=k-k_{1}^{\prime}$ . Then, we may write $T_{k-k_{1}^{\prime}-1:n_{2}}^{(2)}<T_{k:n}<T_{k_{1}^{\prime}+1:n_{1}}^{(1)}$ . When $k_{1}^{\prime}<k_{1}$ , we have $T_{k:n}<T_{k_{1}^{\prime}+1:n_{1}}^{(1)}\leq T_{k_{1}:n_{1}}^{(1)}$ . Similarly, we have $T_{k:n}>T_{k-k_{1}^{\prime}-1:n_{2}}^{(2)}\geq T_{k-k_{1}:n_{2}}^{(2)}$ , which leads to $T_{k-k_{1}:n_{2}}^{(2)}\leq T_{k:n}\leq T_{k_{1}:n_{1}}^{(1)}$ . When $k_{1}^{\prime}>k_{1}$ , we have $T_{k_{1}:n_{1}}^{(1)}\leq T_{k:n}\leq T_{k-k_{1}:n_{2}}^{(2)}$ using the same method as above. For $k_{1}^{\prime}=k_{1}$ , it is obvious that $\min(T_{k_{1}:n_{1}}^{(1)},T_{k-k_{1}:n_{2}}^{(2)})<T_{k:n}=\max(T_{k_{1}:n_{1}}^{(1)},T_{k-k_{1}:n_{2}}^{(2)})$ . This completes the proof.∎

In the following theorem, we find the optimal task allocation $\bm{k}^{*}$ , and show that the expected computing time of an $(\bm{n},\bm{k}^{*})-$ group code converges to that of an $(n,k)-$ MDS code for sufficiently large $n$ .

Theorem 2.

Consider a scenario of computing $k$ tasks on an $(\bm{n},\bm{\mu})-$ group system with $L=2$ groups, where an $(\bm{n},\bm{k})-$ group code is applied. In the asymptotic regime of large $n$ , the optimal task allocation $\bm{k}^{*}=[k_{1}^{*},k-k_{1}^{*}]$ can be obtained111Here we assume that $k_{1}^{*}$ is an integer since the task allocation vector $\bm{k}$ consists of integers. However, in case of $k_{1}^{*}$ not an integer, the optimal allocation rule is either $\bm{k}=[\lceil k_{1}^{*}\rceil,k-\lceil k_{1}^{*}\rceil]$ or $\bm{k}=[\lfloor k_{1}^{*}\rfloor,k-\lfloor k_{1}^{*}\rfloor]$ , since $\mathbb{E}[T_{\mathrm{comp}}(C_{\mathrm{G}}(\bm{n},\bm{k}))]$ is a convex function of $k_{1}$ , as in the proof. by solving

[TABLE]

Moreover, the expected computing time of an $(\bm{n},\bm{k}^{*})-$ group code satisfies the following:

[TABLE]

Proof.

Combining (1) and (2), we obtain

[TABLE]

Note that the first variable of the max function is a strictly increasing convex function of $k_{1}$ , while the second one is a strictly decreasing convex function. Thus, taking the maximum of the two variables results in a convex function of $k_{1}$ . Therefore, as $n$ grows to infinity, the minimizer $k_{1}^{*}$ coincides with the intersection point of the two functions, i.e.,

[TABLE]

From (1) and (6), we obtain (4) by simple algebraic operations. Now we move on to the proof of (5). First, by taking $\raisebox{2.15277pt}{\scalebox{0.8}{$ \displaystyle\lim_{n\to\infty}; $}}\mathbb{E}[\cdot]$ on (3) and applying Lemma 1, we obtain

[TABLE]

When $k_{1}=k_{1}^{*}$ , the upper and lower bounds have the same value as in (6). Thus, by squeeze theorem, we have

[TABLE]

Therefore, we obtain (5) by using $T_{\mathrm{comp}}(C_{\mathrm{MDS}}(n,k))=T_{k:n}$ and $T_{\mathrm{comp}}(C_{\mathrm{G}}(\bm{n},\bm{k}))=\raisebox{2.15277pt}{\scalebox{0.8}{$ \displaystyle\max_{i\in[L]}; $}}T_{k_{i}:n_{i}}^{(i)}$ . ∎

Recall that an $(n,k)-$ MDS code achieves the optimal computing time as stated in Theorem 1. The above theorem implies that an $(\bm{n},\bm{k})-$ group coded system can asymptotically achieve the optimal computing time by using the optimal task allocation rule $\bm{k}=\bm{k}^{*}$ . Note that (4) can be easily solved when $\mu_{1}/\mu_{2}=2$ by using the quadratic formula. The following corollary provides the optimal task allocation $\bm{k}^{*}$ and the corresponding $\mathbb{E}[T_{\mathrm{comp}}(C_{\mathrm{G}}(\bm{n},\bm{k}^{*}))]$ when $\mu_{1}=2\mu_{2}$ .

Corollary 1.

Consider the scenario of computing $k$ tasks on an $(\bm{n},\bm{\mu})-$ group system with $L=2$ and $\bm{\mu}=[2\mu_{2},\mu_{2}]$ . Under the scenario of applying an $(\bm{n},\bm{k})-$ group code on this system, the optimal task allocation $\bm{k}^{*}=[k_{1}^{*},k-k_{1}^{*}]$ is obtained as

[TABLE]

Moreover, the expected value of the corresponding computing time $\mathbb{E}[T_{\mathrm{comp}}(C_{\mathrm{G}}(\bm{n},\bm{k}^{*}))]$ can be calculated as

[TABLE]

Proof.

When $\mu_{1}=2\mu_{2}$ , the equation (4) reduces to (7). In addition, inserting (7) into (1) results in (8). ∎

IV-B Numerical Results when the Number of Nodes are Finite

Here, we provide simulation results on the computing time of an $(\bm{n},\bm{k})-$ group code when the number of nodes $n$ is finite. Fig. 3 illustrates the expected computing time of an $(n,k)-$ MDS code $\mathbb{E}[T_{\mathrm{comp}}(C_{\mathrm{MDS}}(n,k))]$ and that of $(\bm{n},\bm{k})-$ group code $\mathbb{E}[T_{\mathrm{comp}}(C_{\mathrm{G}}(\bm{n},\bm{k}))]$ , for various $n$ . We consider two types of group codes: one with the optimal task allocation $\bm{k}^{*}=[k_{1}^{*},k-k_{1}^{*}]$ , and the other with an even task allocation $\bm{k}^{\mathrm{even}}=[\frac{1}{2}k,\frac{1}{2}k]$ . For a fixed number of tasks $k=100$ , we assume that $n$ workers are divided into two groups as $\bm{n}=[n_{1},n_{2}]=[\frac{3}{4}n,\frac{1}{4}n]$ . Moreover, the average computing time of a worker doubles in the first group, i.e., $\bm{\mu}=[\mu_{1},\mu_{2}]=[1,2]$ . For the estimation, we employ Monte Carlo methods with $10^{4}$ random samples. The simulation result demonstrates that the expected computing time of an $(\bm{n},\bm{k}^{*})-$ group code approaches to that of an $(n,k)-$ MDS code in the asymptotic regime of large $n$ , as proved in Theorem 2. Moreover, the average computing times of two group codes $-$ the optimal group code $C_{\mathrm{G}}(\bm{n},\bm{k}^{*})$ and a naive group code $C_{\mathrm{G}}(\bm{n},\bm{k}^{\mathrm{even}})$ $-$ have a significant gap, which supports the necessity of a careful task allocation considering the heterogeneity of groups.

V Computing Time Analysis for General $L$

V-A Computing Time for an Arbitrary Task Allocation $\bm{k}$

This section provides the expected computing time of an $(\bm{n},\bm{k})-$ group code for an arbitrary number of groups, i.e. $L\geq 2$ . The following lemma provides a numerical way to obtain $T_{\mathrm{comp}}(C_{\mathrm{G}}(\bm{n},\bm{k}))$ as $n$ grows to infinity.

Lemma 3.

Consider an $(\bm{n},\bm{\mu})-$ group system with $L$ groups. Then, the expected computing time of an $(\bm{n},\bm{k})-$ group code satisfies the following:

[TABLE]

Proof.

The proof is located at Appendix B. ∎

This lemma signals that the expected computing time of an $(\bm{n},\bm{k})-$ group code can be easily obtained when $\bm{n},\bm{k}$ and $\bm{\mu}$ are given.

V-B Optimizing Task Allocation

In this subsection, we present the optimal task allocation rule $\bm{k}^{*}$ for given parameters $\bm{n},\bm{\mu}$ and $k$ . Before optimizing the task allocation vector $\bm{k}$ , we provide a relationship between order statistics $T_{k:n}$ and $\{T^{(i)}_{k_{i}:n_{i}}\}_{i=1}^{L}$ .

Lemma 4.

Under the scenario of computing $k$ tasks on an $(\bm{n},\bm{\mu})-$ group system with $L$ groups, consider applying an $(\bm{n},\bm{k})-$ group code where $\bm{n}=[n_{1},n_{2},\dots,n_{L}]$ and $\bm{k}=[k_{1},k_{2},\dots,k_{L}]$ . Given an arbitrary realization of completion time $\{T_{j}^{(i)}\}_{i\in[L],j\in[n_{i}]}$ of workers, let $T_{k:n}$ be the $k^{th}$ smallest value among $\{T_{j}^{(i)}\}_{i\in[L],j\in[n_{i}]}$ . Meanwhile, $T_{k_{i}:n_{i}}^{(i)}$ denotes the $k_{i}^{th}$ smallest value among $\{T_{j}^{(i)}\}_{j=1}^{n_{i}}$ . Then, we have

[TABLE]

Proof.

The proof can be found at Appendix C ∎

Here we recall that $T_{k:n}$ is the computing time of an $(n,k)-$ MDS code and the upper bound $\raisebox{2.15277pt}{\scalebox{0.8}{$ \displaystyle\max_{i\in[L]}; $}}T^{(i)}_{k_{i}:n_{i}}$ is the computing time of an $(\bm{n},\bm{k})-$ group code. Now, Theorem 3 specifies the optimal task allocation $\bm{k}^{*}$ defined as (2) when there are $L$ groups. Moreover, the computing time of an $(\bm{n},\bm{k}^{*})-$ group code and an $(n,k)-$ MDS code is compared.

Theorem 3.

Consider the scenario of computing $k$ tasks on an $(\bm{n},\bm{\mu})-$ group system with $L$ groups, where an $(\bm{n},\bm{k})-$ group code is applied. In the asymptotic regime of large $n$ , the optimal task allocation $\bm{k}^{*}=[k_{1}^{*},k_{2}^{*},\cdots,k_{L}^{*}]$ can be obtained222Here we assume that $k_{i}^{*}$ is an integer for $i\in[L]$ since task allocation vector $\bm{k}$ consists of integer values. However, in the case of $k_{i}^{*}$ not an integer, we can use the round function to set $\bm{k}^{*}=[\lfloor k_{1}^{*}\rceil,\lfloor k_{2}^{*}\rceil,\cdots,\lfloor k_{L}^{*}\rceil]$ . For reasonably large $n$ and $k$ , this rounding function has a negligible impact on the overall performance. by solving the following equations for $i\in[L]$ :

[TABLE]

Moreover, the corresponding expected computing time of an $(\bm{n},\bm{k}^{*})-$ group code is equal to that of an $(n,k)-$ MDS code as $n$ goes to infinity, i.e.

[TABLE]

Proof.

We prove this theorem at Appendix D. ∎

Recall that an $(n,k)-$ MDS code is optimal in terms of computing time. The above theorem illustrates that an $(\bm{n},\bm{k})-$ group code can asymptotically achieve the optimal computing time when the tasks are optimally allocated, i.e. $\bm{k}=\bm{k}^{*}$ .

VI Decoding Time Analysis

Now we compare the decoding complexity of the suggested $(\bm{n},\bm{k})-$ group code to that of an $(n,k)-$ MDS code. We assume that the decoding complexity of an $(n,k)-$ MDS code is $\mathcal{O}(k^{\beta})$ for $\beta>1$ 333According to the recent works [20, 21] on decoding algorithms, practical scenarios satisfy $\beta>1$ .. Then, the suggested $(\bm{n},\bm{k})-$ group code has a decoding complexity of $\mathcal{O}((k_{\mathrm{max}})^{\beta})$ by the virtue of parallel decoding, where $k_{\mathrm{max}}=\raisebox{2.15277pt}{\scalebox{0.8}{$ \displaystyle\max_{i\in[L]}; $}}k_{i}$ . Note that decoding complexities of two schemes grow with different orders. For a comparison, we define the ratio of the two orders as

[TABLE]

Note that the ratio $\rho_{\mathrm{dec}}$ can be minimized down to $(1/L)^{\beta}$ when we have $k_{\mathrm{max}}=k/L$ .

Fig. 4 illustrates $\rho_{\mathrm{dec}}$ under two different scenarios for given $n=240$ and $k=120$ . In both scenarios, $\bm{n}$ and $\bm{\mu}$ are randomly generated. Moreover, the task allocations for both scenarios are selected as the optimal $\bm{k}^{*}$ , depending on the given parameters of $\bm{n}$ and $\bm{\mu}$ . Motivated by the practical setting where the size of each group and the average computing time of each worker are bounded, we set $\bm{n}\sim\mathrm{unif}(0.7\frac{n}{L},1.3\frac{n}{L})$ and $\bm{\mu}\sim\mathrm{unif}(1,2)$ with uniform distributions. Scenarios 1 and 2 differ in the rule of ordering the elements of $\bm{n}$ and $\bm{\mu}$ , as illustrated below. For scenario 1, we sort the elements of $\bm{n}$ and $\bm{\mu}$ in ascending and descending order, respectively. In other words, $n_{i}\leq n_{j}$ and $\mu_{i}\geq\mu_{j}$ hold for all $i<j$ . This is the scenario when a group with less average response time has less workers. In the case of scenario 2, both $\bm{n}$ and $\bm{\mu}$ are sorted in ascending order, i.e., $n_{i}\geq n_{j}$ and $\mu_{i}\geq\mu_{j}$ hold for $i<j$ . This is the scenario when a group with less average response time has more workers. Under these scenarios, we obtain the average values of $\rho_{\mathrm{dec}}$ for $10^{4}$ samples when $\beta=2$ . The simulations on two scenarios are compared to the minimum achievable $\rho_{\mathrm{dec}}=(1/L)^{\beta}$ . Moreover, we plotted the trend line, which is set to stretch from the point of Scenario 2 for $L=2$ and grow by a factor of $(1/L)^{\beta}$ .

Fig. 4 delineates that $\rho_{\mathrm{dec}}$ diminishes along with the trend line under any scenarios as $L$ grows. Combining this with the definition of $\rho_{\mathrm{dec}}$ , we can remark that $k_{\mathrm{max}}$ is inversely proportional to $L$ in practical scenarios. Moreover, the proposed group code provides a significant decoding complexity reduction in both scenarios. For example, when $L=4$ , an $(\bm{n},\bm{k})-$ group code already achieves roughly 10x reduced decoding complexity compared to an $(n,k)-$ MDS code.

Now we compare the total execution time of the suggested group code to existing schemes by using a simulation. We represent the total execution time as $T_{\mathrm{exec}}=T_{\mathrm{comp}}+\alpha T_{\mathrm{dec}}$ , where the coefficient $\alpha\geq 0$ indicates a relative weight of the decoding complexity compared to the computing time. We simulate the computing of $k=400$ tasks on an $(\bm{n},\bm{\mu})-$ group system with $\bm{n}=[180,170,160,140,130,120]$ and $\bm{\mu}=[1.25,1.35,1.45,1.55,1.65,1.75]$ , which leads to $n=900$ with $L=6$ groups. For varying $\alpha$ , we observe the execution times of the MDS code, the product code, and the suggested group code with parameters listed on Table I. The decoding complexity of the product code is $\mathcal{O}((\sqrt{k})^{\beta+1})$ because the decoding procedure consists of decoding $2\sqrt{k}$ MDS codes, where the dimension of each MDS code is $\sqrt{k}$ . For the group code, we use the optimal task allocation rule $\bm{k}=\bm{k}^{*}$ . For the decoding complexity, we use a parameter of $\beta=2$ .

Fig. 5(a) and Fig. 5(b) show the simulated execution times for different regimes of $\alpha$ . Fig. 5(a) illustrates the situation where the computing time is dominant, i.e. $\alpha$ is small. When $\alpha$ is the lowest in Fig. 5(a), the MDS code gives the smallest execution time, followed by the group code and then the product code. This coincides with the two mathematical results shown above: the optimality of the MDS code in Lemma 1 and the asymptotic optimality of the group code in Theorem 3. Note that the coding scheme that gives the best execution time changes as $\alpha$ varies. Meanwhile, Fig. 5(b) represents the situation where the decoding complexity dominates the execution time. Notice that the execution time of the MDS code becomes inferior to other schemes due to its huge decoding complexity as $\alpha$ grows. On this computing system, the group code gives the best execution time for all regime of $\alpha$ . In general, the order of $k_{\mathrm{max}}$ determines which of the group code or the product code has a better decoding complexity. Recall that the decoding complexity of the group code and the product code are $\mathcal{O}(k_{\mathrm{max}}^{\beta})$ and $\mathcal{O}((\sqrt{k})^{\beta+1})$ , respectively. Thus, we can say that the decoding complexity of the group code is better than the product code when

[TABLE]

holds. Remind that $k_{\mathrm{max}}$ is inversely proportional to $L$ under practical scenarios as shown in Fig. 4. Thus, the condition in (11) reduces to $L=\Omega\left(\dfrac{1}{(\sqrt{k})^{1+\frac{1}{\beta}}}\right)$ . This implies that when a system has sufficiently large number of groups, the group code outperforms the product code in terms of the decoding complexity.

VII Conclusion

In this paper, we propose a coded computation scheme appropriate for a practical model, which reflects the tree-shaped structure and the heterogeneity of groups. Precisely, we consider systems with $L$ heterogeneous groups that have distinct computing time statistics and a different number of workers. We prove that the suggested group-coded scheme can asymptotically achieve the optimal computing time as $n$ grows to infinity. In the regime of finite $n$ , numerical results show that the suggested scheme also provides a near-optimal computing time. Moreover, the suggested scheme can reduce the decoding complexity down to a factor of $(\frac{1}{L})^{\beta}$ , where $\beta>1$ , compared to the existing MDS coded scheme. Finally, the total execution time $-$ the sum of the computing time and the decoding time $-$ of the suggested scheme is numerically shown to outperform other existing state-of-the-art coding schemes.

Appendix A Proof of Lemma 1

We first show that $\max(T_{k_{1}:n_{1}}^{(1)},T_{k_{2}:n_{2}}^{(2)})$ is determined as one of $T_{k_{1}:n_{1}}^{(1)}$ and $T_{k_{2}:n_{2}}^{(2)}$ for sufficiently large $n$ , and thereby the expected value of $\max(T_{k_{1}:n_{1}}^{(1)},T_{k_{2}:n_{2}}^{(2)})$ is determined as the maximum among the expected values of $T_{k_{1}:n_{1}}^{(1)}$ and $T_{k_{2}:n_{2}}^{(2)}$ .

First, consider the $k^{th}$ order statistic of i.i.d. $n$ random variables $T_{k:n}$ , whose probability distribution function (PDF) and cumulative distribution function (CDF) are denoted by $f(\cdot)$ and $F(\cdot)$ . We represent an empirical CDF obtained with $n$ samples as $\tilde{F}_{n}(\cdot)$ . According to [22], $T_{k:n}$ can be represented as

[TABLE]

where $\xi=F^{-1}(k/n)$ and the third term $R_{n}$ satisfies $n^{1/2}R_{n}\xrightarrow{p}0.$ In [22], it is shown that $n^{1/2}(T_{k:n}-\xi)\xrightarrow{d}X$ , where $X\sim N\left(0,\dfrac{(k/n)(1-k/n)}{f^{2}(\xi)}\right)$ . Thus, we have $T_{k:n}\xrightarrow{d}N\left(\xi,\dfrac{(k/n)(1-k/n)}{nf^{2}(\xi)}\right).$

Now, we examine the convergence of $T_{k_{1}:n_{1}}^{(1)}-T_{k_{2}:n_{2}}^{(2)}$ by using (12). Let $f_{(i)}$ and $F_{(i)}$ be the PDF and CDF of an exponential random variable with rate $k\mu_{i}$ , and define $\xi^{(i)}$ as $\xi^{(i)}=F^{-1}_{(i)}(k_{i}/n_{i})$ for $i=1,2$ , i.e.

[TABLE]

Then, we can think of the asymptotic distribution of the $T_{k_{1}:n_{1}}^{(1)}-T_{k_{2}:n_{2}}^{(2)}$ as follows:

[TABLE]

where $Z_{V}\sim N(0,V)$ for $V=\dfrac{\frac{k_{1}}{n_{1}}(1-\frac{k_{1}}{n_{1}})}{f_{(1)}^{2}(\xi^{(1)})}+\dfrac{\frac{k_{2}}{n_{2}}(1-\frac{k_{2}}{n_{2}})}{f^{2}_{(2)}(\xi^{(2)})}$ . By the definition of convergence in distribution, for any $\epsilon>0$ , we have

[TABLE]

Then, the convergence of $T_{k_{1}:n_{1}}^{(1)}-T_{k_{2}:n_{2}}^{(2)}$ into $\xi^{(1)}-\xi^{(2)}$ can be derived as follows.

[TABLE]

This means $T_{k_{1}:n_{1}}^{(1)}-T_{k_{2}:n_{2}}^{(2)}$ converges in probability towards the constant $\xi^{{(1)}}-\xi^{(2)}$ as $n\to\infty$ , i.e.

[TABLE]

It illustrates that for sufficiently large $n$ , the order of two independent order statistics is maintained corresponding to their mean values due to the convergence. Consequently, the sign of $T_{k_{1}:n_{1}}^{(1)}-T_{k_{2}:n_{2}}^{(2)}$ loses randomness and is determined in asymptotic regime of large $n$ . Therefore, we can claim that

[TABLE]

This equation indicates that in asymptotic regime of large $n$ , the random variable $\mathbbm{1}_{T_{k_{1}:n_{1}}^{(1)}>T_{k_{2}:n_{2}}^{(2)}}$ , which has cumbersome distribution, can be substituted with $\mathbbm{1}_{\xi^{(1)}>\xi^{(2)}}$ , which is a binary number that can be easily calculated.

Now we prove the statement of Lemma 1 by using (15) as follows.

[TABLE]

Equality $(a)$ holds since limit and expectation can be interchanged when the random variable is non-negative, which is satisfied because $\max(T_{k_{1}:n_{1}}^{(1)},T_{k_{2}:n_{2}}^{(2)})\geq 0$ . Equality $(b)$ holds by (15). Note that this proof can be directly applied to the min function of two independent order statistics instead of max function.

Appendix B Proof of Lemma 3

We prove the statement by using the mathematical induction. For the base step, we already prove the statement for $L=2$ in Lemma 1. Now, we show if the statement is true for an arbitrary $L>2$ , then the statement still holds for $L+1$ . Before moving onto the proof, we provide the convergence of max function, which is necessary for the proof. Recall that equation (15) shows the order of two independent order statistics is determined by their expectation values for sufficiently large $n$ . Thus, we can claim for arbitrary $\gamma,\delta\in[L]$ , the following statement is true.

[TABLE]

This leads to

[TABLE]

where $i_{\mathrm{max}}=\underset{i\in[L]}{\arg\max}\hskip 2.84526pt\xi^{(i)}.$ In other words, the maximum of $L$ independent order statistics is determined as the one that has the largest expectation value for sufficiently large $n$ .

We here move on to the inductive step, assuming the statement holds for $L=L^{\prime}$ as

[TABLE]

Now, we examine the statement holds for $L^{\prime}+1$ as well:

[TABLE]

Equality $(\mathrm{c})$ holds since $\underset{i\in[L^{\prime}]}{\max}\hskip 2.84526ptT^{(i)}_{k_{i}:n_{i}}$ becomes the one whose expectation value is the largest for sufficiently large $n$ as shown in (16). We can lead to equality $(\mathrm{d})$ by Lemma 1 since it is equivalent to the case when $L=2$ . Equality $(\mathrm{e})$ holds by the assumption (17). Thus, we have

[TABLE]

which completes the whole proof of this lemma. Similarly, we can show

[TABLE]

Appendix C Proof of Lemma 4

Imagine there are three groups. Then, for arbitrary realization of $\{T_{j}^{(i)}\}_{i\in[3],j\in[n_{i}]}$ , the following inequalities hold by Lemma 2.

[TABLE]

We can change the lower bound by using an apparent inequality $\min(T^{(2)}_{k_{2}:n_{2}},T^{(3)}_{k_{3}:n_{3}})\leq\max(T^{(2)}_{k_{2}:n_{2}},T^{(3)}_{k_{3}:n_{3}})$ to have

[TABLE]

Thus, we have

[TABLE]

We can also prove the statement for an arbitrary $L\geq 2$ by repeating this process. Thus, we have

[TABLE]

Appendix D Proof of Theorem 3

We first prove that the best task allocation rule $\bm{k}^{*}$ satisfies that the following equations:

[TABLE]

Then, we show the an $(\bm{n},\bm{k}^{*})-$ group code achieves the same computing time as an $(n,k)-$ MDS code in an asymptotic region of large $n$ . Afterwards, we provide the proof of the existence and the uniqueness of $\bm{k}^{*}$ .

First, we rewrite the the statement (9) of Lemma 3 w.r.t. $\raisebox{2.15277pt}{\scalebox{0.8}{$ \displaystyle\lim_{n\to\infty}; $}}\mathbb{E}[T_{k_{j}:n_{j}}^{(j)}]$ for $j\in[L]$ as follows:

[TABLE]

Note that the first variable of the max function is a strictly increasing convex function with $k_{j}$ , whereas the second variable $\raisebox{2.15277pt}{\scalebox{0.8}{$ \displaystyle\max_{i\neq j}; $}}(\raisebox{2.15277pt}{\scalebox{0.8}{$ \displaystyle\lim_{n\to\infty}; $}}\mathbb{E}[T_{k_{i}:n_{i}}^{(i)}])$ is a strictly deceasing convex function with $k_{j}$ because it is equivalent to the time for computing $k-k_{j}$ tasks by using a group code with $L-1$ groups by (9). Hence, taking max of the two variables results in a convex function that has the minimum value at the intersection of the two variables. Hence, the optimal value of $k_{j}=k_{j}^{*}$ satisfies

[TABLE]

We may write as below:

[TABLE]

To satisfy the above inequality for all $i\neq j$ and $j\in[L]$ , the optimal tast allocation $\bm{k}^{*}$ must satisfy the equation (18).

Next, we consider the following bounds, which obtained by taking $\raisebox{2.15277pt}{\scalebox{0.8}{$ \displaystyle\lim_{n\to\infty}; $}}\mathbb{E}[\cdot]$ of the bounds suggested in Lemma 4 and applying Lemma 3:

[TABLE]

For $\bm{k}=\bm{k}^{*}$ , the above lower and upper bounds have an equal value by (18). Hence, $\raisebox{2.15277pt}{\scalebox{0.8}{$ \displaystyle\lim_{n\to\infty}; $}}\mathbb{E}[T_{k:n}]$ and $\raisebox{2.15277pt}{\scalebox{0.8}{$ \displaystyle\max_{i\in[L]}; $}}(\raisebox{2.15277pt}{\scalebox{0.8}{$ \displaystyle\lim_{n\to\infty}; $}}\mathbb{E}[T_{k_{i}^{*}:n_{i}}^{(i)}])$ have the same value, which correspond to $\raisebox{2.15277pt}{\scalebox{0.8}{$ \displaystyle\lim_{n\to\infty}; $}}\mathbb{E}[T_{\mathrm{comp}}(C_{\mathrm{MDS}}(n,k))]$ and $\raisebox{2.15277pt}{\scalebox{0.8}{$ \displaystyle\lim_{n\to\infty}; $}}\mathbb{E}[T_{\mathrm{comp}}(C_{\mathrm{G}}(\bm{n},\bm{k}^{*}))]$ respectively. Thus, we prove

[TABLE]

Lastly, we move on to the proof of the existence and the uniqueness of $\bm{k}^{*}$ . Remark that the interval of $k_{i}^{*}$ is confined as $k_{i}^{*}\in[\max(0,k-n+n_{i}),\min(n_{i},k)]$ due to the conditions $k_{i}\leq n_{i}$ and $k\leq n$ . By inserting equation (13) to (18), the following equation is obtained for $i,j\in[L]$ :

[TABLE]

Thus, we may write the following equation which consists of a single variable $k_{i}^{*}$ .

[TABLE]

For simplicity, we denote the right-hand side by $h(k_{i}^{*})$ . Note that $h(k_{i}^{*})$ is a strictly increasing function with $k_{i}^{*}$ . thus we can complete the proof if we show $h(k_{i}^{*})$ starts from a value lower than $k$ and reaches to another value greater than $k$ in the given interval. Firstly, when the lower bound $\max(0,k-n+n_{i})$ is [math], it is obvious that $h(0)=0$ . The other case, when $k-n+n_{i}>0$ , is also easily proved as,

[TABLE]

Similarly, when the upper bound $\min(n_{i},k)$ is $n_{i}$ , one can easily show that $h(n_{i})=n>k$ . The other case of $\min(n_{i},k)=k$ , i.e. $k<n_{i}$ , also satisfies $h(k)>k$ as follows.

[TABLE]

We complete the proof by showing that $h(k_{i}^{*})<k$ for the lower bound $k_{i}^{*}=\max(0,k-n+n_{i})$ and $h(k_{i}^{*})>k$ for the upper bound $k_{i}^{*}=\min(n_{i},k)$ , which guarantees the existence of the one intersection between a strictly increasing function $h(k_{i}^{*})$ and a constant function $k$ with $k_{i}^{*}$ .

Bibliography22

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, A. Senior, P. Tucker, K. Yang, Q. V. Le et al. , “Large scale distributed deep networks,” in Advances in neural information processing systems , 2012, pp. 1223–1231.
2[2] J. Dean and L. A. Barroso, “The tail at scale,” Communications of the ACM , vol. 56, no. 2, pp. 74–80, 2013.
3[3] K. Lee, M. Lam, R. Pedarsani, D. Papailiopoulos, and K. Ramchandran, “Speeding up distributed machine learning using codes,” IEEE Transactions on Information Theory , vol. 64, no. 3, pp. 1514–1529, 2018.
4[4] K. Lee, C. Suh, and K. Ramchandran, “High-dimensional coded matrix multiplication,” in Information Theory (ISIT), 2017 IEEE International Symposium on . IEEE, 2017, pp. 2418–2422.
5[5] Q. Yu, M. Maddah-Ali, and S. Avestimehr, “Polynomial codes: an optimal design for high-dimensional coded matrix multiplication,” in Advances in Neural Information Processing Systems , 2017, pp. 4403–4413.
6[6] T. Baharav, K. Lee, O. Ocal, and K. Ramchandran, “Straggler-proofing massive-scale distributed matrix multiplication with d-dimensional product codes,” 2018.
7[7] N. Raviv, I. Tamo, R. Tandon, and A. G. Dimakis, “Gradient coding from cyclic mds codes and expander graphs,” ar Xiv preprint ar Xiv:1707.03858 , 2017.
8[8] R. Tandon, Q. Lei, A. G. Dimakis, and N. Karampatziakis, “Gradient coding: Avoiding stragglers in distributed learning,” in International Conference on Machine Learning , 2017, pp. 3368–3376.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Coded Matrix Multiplication

Abstract

I Introduction

I-A Main Contributions

I-B Related Works

I-C Notations

II System Model and Target Problem

II-A System Model

II-B Target Problem

III Optimal Computing Time Analysis

Theorem 1**.**

Proof.

IV Computing Time Analysis

IV-A Computing Time for an Arbitrary Task Allocation k\bm{k}k

Lemma 1**.**

Proof.

Lemma 2**.**

Proof.

Theorem 2**.**

Proof.

Corollary 1**.**

Proof.

IV-B Numerical Results when the Number of Nodes are Finite

V Computing Time Analysis for General LLL

V-A Computing Time for an Arbitrary Task Allocation k\bm{k}k

Lemma 3**.**

Proof.

V-B Optimizing Task Allocation

Lemma 4**.**

Proof.

Theorem 3**.**

Proof.

VI Decoding Time Analysis

VII Conclusion

Appendix A Proof of Lemma 1

Appendix B Proof of Lemma 3

Appendix C Proof of Lemma 4

Appendix D Proof of Theorem 3

Theorem 1.

IV-A Computing Time for an Arbitrary Task Allocation $\bm{k}$

Lemma 1.

Lemma 2.

Theorem 2.

Corollary 1.

V Computing Time Analysis for General $L$

V-A Computing Time for an Arbitrary Task Allocation $\bm{k}$

Lemma 3.

Lemma 4.

Theorem 3.