On Model Coding for Distributed Inference and Transmission in Mobile   Edge Computing Systems

Jingjing Zhang; Osvaldo Simeone

arXiv:1904.05591·cs.IT·April 12, 2019

On Model Coding for Distributed Inference and Transmission in Mobile Edge Computing Systems

Jingjing Zhang, Osvaldo Simeone

PDF

Open Access

TL;DR

This paper explores how coding model information in mobile edge computing can reduce latency in distributed linear inference tasks despite non-deterministic EN processing times.

Contribution

It provides an information-theoretic analysis showing that coding model data can significantly lower total latency in distributed inference systems.

Findings

01

Coding reduces overall latency in distributed inference.

02

Cooperative transmission benefits are limited by coding.

03

Coding is crucial for latency reduction despite non-deterministic EN times.

Abstract

Consider a mobile edge computing system in which users wish to obtain the result of a linear inference operation on locally measured input data. Unlike the offloaded input data, the model weight matrix is distributed across wireless Edge Nodes (ENs). ENs have non-deterministic computing times, and they can transmit any shared computed output back to the users cooperatively. This letter investigates the potential advantages obtained by coding model information prior to ENs' storage. Through an information-theoretic analysis, it is concluded that, while generally limiting cooperation opportunities, coding is instrumental in reducing the overall computation-plus-communication latency.

Equations27

I_{k} (m_{k}, s_{k}) = {s_{i, k} X : i \in [m_{k}]},

I_{k} (m_{k}, s_{k}) = {s_{i, k} X : i \in [m_{k}]},

T_{C} = min {t : m (t) \in M} .

T_{C} = min {t : m (t) \in M} .

v_{n} = k = 1 \sum K h_{nk} u_{k} + z_{n},

v_{n} = k = 1 \sum K h_{nk} u_{k} + z_{n},

t_{k} = λ_{k} + τ m_{k},

t_{k} = λ_{k} + τ m_{k},

\displaystyle T_{C}=\max_{k\in[K]}\big{(}\lambda_{k}+\tau m^{*}_{k}(\boldsymbol{\lambda})\big{)},

\displaystyle T_{C}=\max_{k\in[K]}\big{(}\lambda_{k}+\tau m^{*}_{k}(\boldsymbol{\lambda})\big{)},

m^{*} (λ) = arg m \in M min k \in [K] max (λ_{k} + τ m_{k}) .

m^{*} (λ) = arg m \in M min k \in [K] max (λ_{k} + τ m_{k}) .

δ_{D} = P \to \infty lim \frac{T _{D}}{N L / lo g ( P )} .

δ_{D} = P \to \infty lim \frac{T _{D}}{N L / lo g ( P )} .

δ = E [δ_{C}] + γ E [δ_{D}],

δ = E [δ_{C}] + γ E [δ_{D}],

\displaystyle\!\!\delta_{UC}\!=\!\textrm{E}\Bigg{[}\!\frac{\max_{k\in[K]}\!\!\big{(}\lambda_{k}\!+\!\tau m^{*}_{k}(\boldsymbol{\lambda})\big{)}}{\tau}\!+\!\!\!\sum_{i\in[m]}\!\!\frac{\gamma}{\min\{r_{i}(\mathbf{m^{*}(\boldsymbol{\lambda})}),N\}}\!\Bigg{]},

\displaystyle\!\!\delta_{UC}\!=\!\textrm{E}\Bigg{[}\!\frac{\max_{k\in[K]}\!\!\big{(}\lambda_{k}\!+\!\tau m^{*}_{k}(\boldsymbol{\lambda})\big{)}}{\tau}\!+\!\!\!\sum_{i\in[m]}\!\!\frac{\gamma}{\min\{r_{i}(\mathbf{m^{*}(\boldsymbol{\lambda})}),N\}}\!\Bigg{]},

δ_{M C} = \frac{( H _{K} - H _{K - ⌈ 1/ μ ⌉} )}{η τ} + m (μ + γ) .

δ_{M C} = \frac{( H _{K} - H _{K - ⌈ 1/ μ ⌉} )}{η τ} + m (μ + γ) .

ρ_{1} ρ_{2} \leq K μ .

ρ_{1} ρ_{2} \leq K μ .

(ρ _{2} K) - (ρ _{2} K - q) \geq \frac{1}{ρ _{1}} (ρ _{2} K)

(ρ _{2} K) - (ρ _{2} K - q) \geq \frac{1}{ρ _{1}} (ρ _{2} K)

δ_{H S}

δ_{H S}

\displaystyle+\gamma\min_{(\rho_{1},\rho_{2})}\bigg{(}\sum_{r_{i}=r_{q}}^{r_{max}}\frac{B_{i}}{r_{i}}+\frac{m-\sum_{r_{i}=r_{q}}^{r_{max}}B_{i}}{r_{q}-1}\bigg{)}\Bigg{]},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIoT and Edge/Fog Computing · Privacy-Preserving Technologies in Data · Age of Information Optimization

Full text

On Model Coding for Distributed Inference and Transmission in Mobile Edge Computing Systems

Jingjing Zhang and Osvaldo Simeone

Abstract

Consider a mobile edge computing system in which users wish to obtain the result of a linear inference operation00footnotetext: The authors are with the Department of Informatics at King’s College London, UK (emails: [email protected], [email protected]). The authors have received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 Research and Innovation Programme (Grant Agreement No. 725731). on locally measured input data. Unlike the offloaded input data, the model weight matrix is distributed across wireless Edge Nodes (ENs). ENs have non-deterministic computing times, and they can transmit any shared computed output back to the users cooperatively. This letter investigates the potential advantages obtained by coding model information prior to ENs’ storage. Through an information-theoretic analysis, it is concluded that, while generally limiting cooperation opportunities, coding is instrumental in reducing the overall computation-plus-communication latency.

I Introduction

Introduced by the European Telecommunications Standards Institute (ETSI), the concept of mobile edge computing is by now established as a pillar of the 5G network architecture as an enabler of computation-intensive applications on mobile devices [1]. As illustrated in Fig. 1, with mobile edge computing, users offload local data to edge servers connected to wireless Edge Nodes (ENs). These in turn carry out the necessary computations and return the desired output to the users on the wireless downlink. Most academic work on mobile edge computing has focused on the complex resource allocation problem of orchestrating computing and communication resources at the mobiles and at the ENs (see, e.g., [2] and references therein).

Papers in the line of work introduced above either assume generic applications characterized by given input-output rate requirements (e.g., [2]) or optimize the partition of the computing graph of the applications between local and edge computing. Moreover, this body of research has shown the importance of jointly designing the physical-layer transmission strategy and the computing schedule. Importantly, computing the same output at multiple ENs, while generally increasing the computation time, enables cooperation opportunities in the downlink transmission from the ENs to the users [2].

More recently, in a parallel development in the information-theoretic literature, it has been demonstrated that, if the computation of interest has specific properties, coding of either inputs or outputs can help decrease the overall latency. In particular, reference [3] demonstrated the advantages of Maximum Distance Separable (MDS) coding of input matrices in reducing the latency for distributed matrix-vector multiplication in master-worker systems. The impact of coding computational outputs was instead investigated in [4] for Map-Reduce computing tasks.

In this letter, we investigate the role of coding in the mobile edge computing system illustrated in Fig. 1. In the system, each user wishes to compute a linear inference $\mathbf{W}\boldsymbol{x}$ on a local data vector $\boldsymbol{x}$ given a network-side model matrix $\mathbf{W}$ via offloading. The matrix $\mathbf{W}$ is generally large and hence it requires splitting across the servers of multiple ENs. Linear operations are practically important, e.g., for the implementation of recommendation systems based on collaborative filtering [5] or similarity search based on the cosine distance [6]. In both cases, the user-side data is a vector $\boldsymbol{x}$ that embeds the user profile [5] or a query [6], and the goal is to search through the matrix of all items on the basis of the inner products between the corresponding row of matrix $\mathbf{W}$ and the user-data $\boldsymbol{x}$ . This letter presents an information-theoretic framework that enables the potential advantages of model coding and associated performance trade-offs to be quantified.

II System Model and Performance Criteria

II-A System Model

We consider the distributed edge computing model illustrated in Fig. 1, where $N$ users are connected to $K$ ENs through a shared wireless channel. For a given input vector $\boldsymbol{x}\in\mathbb{F}_{2^{L}}^{r\times 1}$ of $rL$ bits provided by a user, the system aims at computing the linear inference operation $\boldsymbol{y}=\mathbf{W}\boldsymbol{x}$ , where the weight, or model, matrix $\mathbf{W}\in\mathbb{F}_{2^{L}}^{m\times r}$ is static for a sufficiently long period of time. Each EN $k$ can store a number of bits equivalent to a fraction $\mu\in[1/K,1]$ of rows of matrix $\mathbf{W}$ , i.e., $m\mu rL$ bits. Storage of information from matrix $\mathbf{W}$ takes place offline given the static nature of the model.

Each user $n$ , with $n\in[N]$ , has its own personal data $\boldsymbol{x}_{n}$ , with $\boldsymbol{x}_{n}\in\mathbb{F}_{2^{L}}^{r\times 1}$ of $rL$ bits, which is collected online by the user, and it wishes to obtain the result of the linear operation $\boldsymbol{y}_{n}=\mathbf{W}\boldsymbol{x}_{n}$ . The task is offloaded to the ENs as shown in Fig. 1. To this end, the ENs acquire the user data $\mathbf{X}=[\boldsymbol{x}_{1},\cdots,\boldsymbol{x}_{N}]$ through uplink transmission. Second, the ENs carry out computations on the received users’ data and on the stored data about $\mathbf{W}$ . Finally, via downlink communication, the ENs deliver the results of the computations to the users, so that each user $n$ can recover the required output $\boldsymbol{y}_{n}$ .

In this letter, we make the simplifying assumption that the time needed to upload $\mathbf{X}$ to all ENs is fixed and each EN gets the entire matrix $\mathbf{X}$ . This allows us to focus on the challenging problem of jointly designing offline model coding and storage at the ENs, as well as online edge computing and downlink transmission phases. The problem is formulated as follows.

Model Coding and Storage: In an offline phase, the model matrix $\mathbf{W}$ is linearly encoded [7] as $[\boldsymbol{c}_{1}^{T},\cdots,\boldsymbol{c}_{m^{\prime}}^{T}]^{T}=\mathbf{G}\mathbf{W},$ where we have defined the coding matrix $\mathbf{G}\in\mathbb{F}_{2^{L}}^{m^{\prime}\times m}$ , with integer $m^{\prime}\geq m$ . Each EN $k$ stores the subset $\mathcal{C}_{k}$ , with $\mathcal{C}_{k}\subseteq\mathcal{C}$ of $|\mathcal{C}_{k}|\leq m\mu$ coded rows.

Edge Computing: In the online phase, each EN $k$ computes inner products between all users’ data received in the uplink and the available coded model rows in set $\mathcal{C}_{k}$ . As in [8], the order in which such computations are carried out is specified by vector $\mathbf{s}^{T}_{k}=[\boldsymbol{s}_{1,k},\cdots,\boldsymbol{s}_{m\mu,k}]$ , where each element $s_{i,k}\in\mathbb{F}_{2^{L}}^{1\times r}$ , with $i\in[m\mu]$ , is selected from the set $\mathcal{C}_{k}$ of coded rows available at EN $k$ . In particular, each EN $k$ starts to compute the inner product $\boldsymbol{s}_{1,k}\mathbf{X}$ and continues computing $\boldsymbol{s}_{i,k}\mathbf{X}\in\mathbb{F}_{2^{L}}^{1\times N}$ , for $i=2,3,\cdots,m\mu$ . As in the literature on distributed computing, we refer to each computation $\boldsymbol{s}_{i,k}\mathbf{X}$ as an Intermediate Value (IV) [9]. A computation policy is hence defined by the coding matrix $\mathbf{G}$ , scheduling matrix $\mathbf{S}\in\mathbb{F}_{2^{rL}}^{m\mu\times K}$ , with the $k$ th column vector given as $\mathbf{s}_{k}$ , as well as by a stopping criterion, which is used by the ENs to decide when to stop the computing phase and start downlink transmission.

To formulate the stopping criterion, we define $\mathbf{m}(t)=[m_{1}(t),\cdots,m_{K}(t)]$ as the vector that indicates how many IVs have been computed by the ENs by time $t$ , with $t=0$ indicating the start of the computing phase and $m_{k}(t)$ denoting the number of computations at each EN $k$ . Note that we have the inequalities $0\leq m_{k}(t)\leq m\mu$ due to the storage constraint. We also define as

[TABLE]

the set of first $m_{k}$ IVs computed by EN $k$ for a given choice of the scheduling vector $\mathbf{s}_{k}$ . A computation vector $\mathbf{m}$ is said to be feasible if the union $\bigcup_{k\in[K]}\mathcal{I}_{k}(m_{k},\mathbf{s}_{k})$ of all computed IVs across all $K$ ENs contains enough information to enable the recovery of all the outputs $\{\boldsymbol{y}_{n}\}_{n=1}^{N}$ , i.e., if the conditional entropy $H\big{(}\{\boldsymbol{y}_{n}\}_{n=1}^{N}|\bigcup_{k\in[K]}\mathcal{I}_{k}(m_{k},\mathbf{s}_{k})\big{)}$ equals zero. Note that, if $\mathbf{m}$ is feasible, then any $\mathbf{m^{\prime}}\geq\mathbf{m}$ , where inequality is element-wise, is also feasible.

A stopping criterion for a given computation policy is defined by a set $\mathcal{M}$ of feasible computation vectors in the sense that the ENs stop computing at the first time $T_{C}$ such that $\mathbf{m}(T_{C})$ is in set $\mathcal{M}$ , i.e.,

[TABLE]

As a result, the computed IVs at EN $k$ by the end of the edge computing phase are given as $\mathcal{I}_{k}=\mathcal{I}_{k}(m_{k}(T_{C}),\mathbf{s}_{k})$ . As a simple example, a computation policy may require that all ENs complete all local computations, i.e., $\mathcal{M}=\{[m\mu,m\mu,\cdots,m\mu]\}$ .

Downlink Communication: In this phase, the ENs send the computed IVs to the users on the downlink so that each user $n$ can recover the desired output $\boldsymbol{y}_{n}$ . To this end, the ENs apply conventional one-shot linear precoding as in [10, 11]. Accordingly, in each downlink transmission block, the transmitted signal at each EN $k\in[K]$ is given as $u_{k}=a_{k}s_{k}$ , where $s_{k}$ is a symbol that encodes a subset of IVs in set $\mathcal{I}_{k}$ , and $a_{k}$ is the corresponding beamforming coefficients. All the ENs that have computed the same IVs can transmit them cooperatively via joint beamforming [10, 11]. We impose the per-EN power constraint $\mathbb{E}\left[|u_{k}|^{2}\right]\leq P$ . In each downlink block, the signal received by each user $n$ is given as

[TABLE]

where $h_{nk}\in\mathbb{C}$ is the channel coefficient from EN $k$ to user $n$ ; $u_{k}\in\mathbb{C}$ is the defined signal transmitted by EN $k$ ; $z_{n}$ is unit-power additive complex Gaussian noise. The fading channels are drawn from a continuous distribution, constant in each block, and known to all ENs.

II-B Performance Analysis

As in [12], we assume that the computing time needed by each EN $k$ to perform $m_{k}$ computations is given as

[TABLE]

where $\lambda_{k}\sim\text{exp}(\eta)$ , independent across ENs, is an exponential random variable with average $1/\eta$ that models the time needed for setup at each EN $k$ ; and $\tau$ is the (deterministic) time required for each computation. Under model (4), given a stopping set $\mathcal{M}$ , the random duration $T_{C}$ in (2) of the computation phase can be written as the optimization

[TABLE]

where we have defined the stopping vector $\mathbf{m}^{*}(\boldsymbol{\lambda})=[m^{*}_{1}(\boldsymbol{\lambda}),\cdots,m^{*}_{K}(\boldsymbol{\lambda})]$ for a given vector $\boldsymbol{\lambda}=[\lambda_{1},\cdots,\lambda_{K}]$ as

[TABLE]

This follows since the time needed to realize a computation vector $\mathbf{m}$ is given by $\max_{k\in[K]}(\lambda_{k}+\tau m_{k})$ .

In the high-SNR regime of interest, we evaluate the downlink phase duration $T_{D}$ by normalizing for the time $NL/\log(P)$ needed to deliver one IV, of size $NL$ bits, to all $N$ users, in the absence of mutual interference. Hence, the normalized communication delay $\delta_{D}$ is given as

[TABLE]

For comparison, we also normalize the computation time $T_{C}$ by the time $\tau$ to compute one IV for all users, obtaining the normalized computation delay $\delta_{C}=T_{C}/\tau$ . Finally, the average total normalized latency $\delta$ of the edge computing system is given as

[TABLE]

where parameter $\gamma$ is the ratio between the average time (in seconds) needed to compute one IV at an EN and the average time needed to transmit one IV on an interference-free channel.

III Uncoded vs. Coded Computing

III-A Uncoded Storage and Computing (UC)

Consider first a standard uncoded strategy whereby each EN stores $m\mu$ rows directly from the model matrix rows $\{\boldsymbol{w}_{i}\}_{i=1}^{m}$ . Following, e.g., [8], the scheduling matrix $\mathbf{S}$ is designed in a cyclic manner, so that each vector $\boldsymbol{w}_{i}$ is repeated $K\mu$ times across all ENs. As an example, if $m=6$ , $\mu=1/2$ and $K=3$ , then the scheduling vector are $\mathbf{s}_{1}=[\boldsymbol{w}_{1},\boldsymbol{w}_{4},\boldsymbol{w}_{5}]$ , $\mathbf{s}_{2}=[\boldsymbol{w}_{2},\boldsymbol{w}_{5},\boldsymbol{w}_{6}]$ , and $\mathbf{s}_{3}=[\boldsymbol{w}_{3},\boldsymbol{w}_{6},\boldsymbol{w}_{4}]$ . The stopping set $\mathcal{M}$ is defined as the set of all feasible computation vectors, so that every vector $\mathbf{m}\in\mathcal{M}$ ensures that each IV $\boldsymbol{w}_{i}\mathbf{X}$ has been computed by some EN.

For each IV $\boldsymbol{w}_{i}\mathbf{X}$ and a given feasible vector $\mathbf{m}\in\mathcal{M}$ , we define as $r_{i}(\mathbf{m})$ the number of times that the IV has been computed across the ENs, i.e., the number of ENs whose set $\mathcal{I}_{k}$ contains the IV. We hence have the constraint $\sum_{i=1}^{m}r_{i}(\mathbf{m})=\sum_{k=1}^{K}{m_{k}}$ . To deliver a single IV computed at $r_{i}(\mathbf{m})$ ENs, cooperative Zero-Forcing (ZF) precoding allows $\min\{r_{i}(\mathbf{m}),N\}$ users to be served at the same time at the maximum high-SNR rate $\log(P)$ , where $\min\{a,b\}$ represents the minimum between the two arguments $a$ and $b$ . This is done by choosing the precoding matrix across the $\min\{r_{i}(\mathbf{m}),N\}$ transmitting ENs to equal the inverse of the (square) channel matrix, upon appropriate power scaling. Hence, the normalized downlink latency (7) for this IV is given as $1/\min\{r_{i}(\mathbf{m}),N\}$ [10, 11]. As a result, the total latency can be characterized as follows.

Proposition 1

With the described uncoded strategy, the average total normalized latency (8) is given as

[TABLE]

where the stopping vector $\mathbf{m}^{*}(\boldsymbol{\lambda})$ is given in (6), and the expectation is taken over the distribution of the random vector $\boldsymbol{\lambda}$ .

III-B MDS coded Storage and Computing (MC)

We proceed to consider an MDS-coded scheme that aims at enhancing robustness to straggling ENs [9, 12, 7]. In this scheme, the coding matrix $\mathbf{G}$ is selected as the generator matrix of an $(K\mu m,m)$ MDS code; each EN $k$ stores $m\mu$ distinct coded rows; and the computing order at each EN is arbitrary. Furthermore, the stopping set $\mathcal{M}$ is defined such that, given the fractional cache size $\mu$ , the system waits for the fastest $\lceil 1/\mu\rceil$ ENs to finish all their computations. By definition of an $(K\mu m,m)$ MDS code, this guarantees that all the $m$ required output elements in $\{\boldsymbol{y}_{n}\}_{n=1}^{N}$ can be obtained from the $m$ IVs computed at the $[1/\mu]$ ENs by treating the missing IVs from the slower $K-\lceil 1/\mu\rceil$ ENs as erasures.

With this scheme, there is no redundancy in the set of IVs computed at the ENs and hence no cooperation opportunities are available for downlink transmission. It follows that the $m$ IVs need to be sent sequentially to each user in the downlink using orthogonal transmission, and thus the communication latency is given as $\delta_{D}=m$ .

Proposition 2

With the described MDS coded scheme, the average total latency (8) is given as

[TABLE]

Proof:

Since only the fastest $\lceil 1/\mu\rceil$ ENs are required to execute their full computations, the average computation time is given as $\textrm{E}[T_{C}]=\textrm{E}[\lambda_{\lceil 1/\mu\rceil:K}]+\tau m\mu=(H_{K}-H_{K-\lceil 1/\mu\rceil})/\eta+\tau m\mu$ , where $\lambda_{\lceil 1/\mu\rceil:K}$ is the $\lceil 1/\mu\rceil_{th}$ order statistics of exponential random variables $\{\lambda_{k}\}_{k=1}^{K}$ , and $H_{K}=\sum_{k=1}^{K}1/k$ is the $K_{th}$ harmonic number (see [12]). ∎

III-C Hybrid Scheme (HS)

We now propose a hybrid scheme whose aim is to combine the robustness to stragglers afforded by the MDS-coded scheme and the cooperative downlink transmission advantages of the uncoded scheme. The proposed hybrid scheme allows the reduction in computing time via MDS coding to be traded off for savings in communication time via EN cooperation. To this end, we concatenate an $(\rho_{1}m,m)$ MDS code for some $\rho_{1}\geq 1$ with a repetition code that replicates each coded vector to $\rho_{2}$ ENs. Controlling the design parameters $(\rho_{1},\rho_{2})$ , the scheme ranges from uncoded storage $(\rho_{1}=1)$ to MDS coding $(\rho_{2}=1)$ .

More precisely, following [7], in order to ensure an even distribution of coded rows, the $\rho_{1}m$ coded rows $\{\boldsymbol{c}_{i}\}_{i=1}^{\rho_{1}m}$ are split into $\binom{K}{\rho_{2}}$ disjoint subsets. Each subset $\mathcal{C}_{\mathcal{K}}$ consists of $b=(\rho_{1}m)/\binom{K}{\rho_{2}}$ coded rows, and is indexed by a subset $\mathcal{K}\subseteq[K]$ of size $\rho_{2}$ , i.e., $|\mathcal{K}|=\rho_{2}$ . Each EN $k$ stores all the rows in the set $\bigcup_{\mathcal{K}:k\in\mathcal{K}}\mathcal{C}_{\mathcal{K}}$ , with cardinality $b\binom{K-1}{\rho_{2}-1}=\rho_{1}\rho_{2}m/K$ . Due to the storage constraint $m\mu$ at each EN, we have the constraint

[TABLE]

We select the stopping set in a manner similar to the MDS coded strategy, so that the computing phase is completed as soon as $q$ ENs complete all their computations, where $q$ is a design parameter. Following [7, Proposition 1], the three design parameters $(q,\rho_{1},\rho_{2})$ need to satisfy the constraint

[TABLE]

in order to ensure that $m$ distinct coded IVs are computed across the ENs and hence all desired outputs can be recovered. It can be observed that the choice of parameters $(\rho_{1},\rho_{2})$ depends on system parameters $K,\mu$ and $\gamma$ , which are constant, and design parameter $q$ . These parameters are expected to be constant for long periods of time and hence frequent re-encoding is not necessary.

At the end of the computing phase, each computed IV $\boldsymbol{c}_{i}\mathbf{X}$ is available at $r_{i}$ ENs, where $r_{i}$ can be shown to lie in the interval $[r_{min},r_{max}]$ , with $r_{min}=\max\{\rho_{2}-(K-q),1\}$ and $r_{max}=\min\{q,\rho_{2}\}$ in a manner similar to [7]. Moreover, for any $r_{i}\in[r_{min},r_{max}]$ , the number of computed IVs is $B_{i}=\binom{q}{r_{i}}\binom{K-q}{\rho_{2}-r_{i}}b$ since there are $\binom{q}{r_{i}}\binom{K-q}{\rho_{2}-r_{i}}$ subsets of ENs that have computed the same IVs. For downlink transmission, in order to maximizing cooperative opportunities, the computed IVs are sent in descending order of redundancy $r_{i}$ by using cooperative ZF precoding to serve $r_{i}$ users simultaneously.

Proposition 3

With the described hybrid scheme, the average total latency (8) is given as

[TABLE]

where we have defined $r_{q}=\inf\big{\{}r:\sum_{r_{i}=r}^{r_{max}}B_{i}\leq m\big{\}}$ ; and the optimization over parameters $q\in[\lceil 1/\mu\rceil,K]$ , $\rho_{1}\in[1,(q+1)/q,\cdots,K/q]$ , and $\rho_{2}\in[\lfloor q\mu\rfloor:\lfloor K\mu\rfloor]$ is constrained by Condition (11) and (12).

Proof:

Given any design parameter $q\in[\lceil 1/\mu\rceil,K]$ , the average computation time is evaluated as in Proposition 2, with the computing latency given as $(H_{K}-H_{K-q})/(\eta\tau)+m\mu$ in (10). Using downlink transmission, the $B_{i}$ IVs with redundancy $r_{i}$ require a communication latency $B_{i}/r_{i}$ using cooperative ZF as explained in Section III-A. In order to deliver $m$ IVs, the IVs with redundancy $r_{i}\in[r_{q},r_{max}]$ are sent in full, while only $m-\sum_{i=r_{q}}^{r_{max}}B_{i}$ IVs with redundancy $r_{q}-1$ need to be delivered. The corresponding total communication latency is optimized over all design parameters $(q,\rho_{1},\rho_{2})$ that satisfy Condition (11) and (12). ∎

IV Example and Discussion

In this section, we present a numerical example for a system with $K=N=6$ ENs and users, $m=60$ row vectors in model matrix $\mathbf{W}$ , and fractional cache size $\mu=0.5$ . We also set the per-IV computation time to $\tau=0.005$ and the average set-up time to different values of $1/\eta$ . In Fig. 2, we plot the overall average latency $\delta$ as a function of the ratio $\gamma$ between normalized computation and communication times.

As seen in Fig. 2, as $\gamma$ increases, the total latencies of both UC in (9) and MC in (10) grow linearly, and the relative performance depends on the values of $\gamma$ and $\eta$ . When $\eta$ is small, i.e., $\eta=0.8$ , the variability in the computing times of the ENs is high, and MDS coding for the most part outperforms the UC scheme due to its robustness to stragglers. This is unless $\gamma$ is large enough, in which downlink transmission latency becomes dominant and the UC scheme can benefit from redundant computations via cooperative EN communication. In contrast, for larger values of $\eta$ , the computing times have low variability and MDS coding is uniformly outperformed by the UC scheme.

We also observe that the proposed hybrid coding strategy is effective in trading off computation and communication latencies by controlling the balance between robustness to stragglers and cooperative opportunities via the design of parameters $(q,\rho_{1},\rho_{2})$ . In fact, by increasing $q$ and $\rho_{2}$ , this approach can decrease the communication latency at the cost of a larger computing latency. Apart from very small values of $\gamma$ for large $\eta$ , the scheme is seem to outperform both MDS and UC strategies.

An interesting open problem is to design a hybrid strategy that generalizes both the proposed MDS and UC schemes by properly optimizing the scheduling matrix in a manner akin to UC. Other aspects that are left for future work include the investigation of coding schemes that enable the use of ENs’ partial computations [12]; of transmission strategies that carry out simultaneous edge computing and downlink communications; of the impact of partial uplink connectivity; and of protocols able to accommodate an arbitrary number of computing tasks.

Bibliography12

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] T. Taleb and et al, “On multi-access edge computing: A survey of the emerging 5G network edge cloud architecture and orchestration,” IEEE Commun. Surveys Tutorials , vol. 19, no. 3, pp. 1657–1681, May 2017.
2[2] S. Sardellitti, G. Scutari, and S. Barbarossa, “Joint optimization of radio and computational resources for multicell mobile-edge computing,” IEEE Trans. Signal Inf. Process. Over Netw. , vol. 1, no. 2, pp. 89–103, June 2015.
3[3] K. Lee, M. Lam, R. Pedarsani, D. Papailiopoulos, and K. Ramchandran, “Speeding up distributed machine learning using codes,” IEEE Trans. Inf. Theory , vol. 64, no. 3, pp. 1514–1529, March 2018.
4[4] S. Li, M. A. Maddah-Ali, and A. S. Avestimehr, “Coding for distributed fog computing,” IEEE Commun. Magazine , vol. 55, no. 4, pp. 34–40, April 2017.
5[5] J. B. Schafer, D. Frankowski, J. Herlocker, and S. Sen, “Collaborative filtering recommender systems,” in The Adaptive Web . Springer Berlin/Heidelberg, 2007, pp. 291–324.
6[6] R. J. Bayardo, Y. Ma, and R. Srikant, “Scaling up all pairs similarity search,” in WWW , 2007, pp. 131–140.
7[7] J. Zhang and O. Simeone, “Improved latency-communication trade-off for map-shuffle-reduce systems with stragglers.” [Online]. Available: http://arxiv.org/abs/1808.06583
8[8] E. Ozfatura, S. Ulukus, and D. Gündüz, “Distributed gradient descent with coded partial gradient computations.” [Online]. Available: https://arxiv.org/abs/1811.09271