Optimal Communication-Computation Trade-Off in Heterogeneous Gradient   Coding

Tayyebeh Jahani-Nezhad; Mohammad Ali Maddah-Ali

arXiv:2103.01589·cs.IT·March 3, 2021

Optimal Communication-Computation Trade-Off in Heterogeneous Gradient Coding

Tayyebeh Jahani-Nezhad, Mohammad Ali Maddah-Ali

PDF

TL;DR

This paper characterizes the optimal communication cost in heterogeneous gradient coding systems with arbitrary data placement, accounting for stragglers and adversarial nodes, and proposes schemes for exact and approximate gradient computation.

Contribution

It provides a precise formula for the minimum communication cost based on data replication, and introduces schemes for exact and approximate gradient coding in heterogeneous systems.

Findings

01

Optimal communication cost is inversely proportional to data replication minus stragglers and adversaries.

02

The scheme supports polynomial function computation of the aggregated gradient matrix.

03

An approximate coding scheme is proposed for limited data repetition or higher straggler counts.

Abstract

Gradient coding allows a master node to derive the aggregate of the partial gradients, calculated by some worker nodes over the local data sets, with minimum communication cost, and in the presence of stragglers. In this paper, for gradient coding with linear encoding, we characterize the optimum communication cost for heterogeneous distributed systems with \emph{arbitrary} data placement, with $s \in N$ stragglers and $a \in N$ adversarial nodes. In particular, we show that the optimum communication cost, normalized by the size of the gradient vectors, is equal to $(r - s - 2 a)^{- 1}$ , where $r \in N$ is the minimum number that a data partition is replicated. In other words, the communication cost is determined by the data partition with the minimum replication, irrespective of the structure of the placement. The proposed achievable scheme also allows us to target…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.