A Fundamental Tradeoff between Computation and Communication in Distributed Computing
Songze Li, Mohammad Ali Maddah-Ali, Qian Yu, A. Salman Avestimehr

TL;DR
This paper characterizes a fundamental inverse relationship between computation and communication in distributed computing, proposing a coding scheme that optimally balances the two and demonstrating practical speedups in a benchmark application.
Contribution
It introduces Coded Distributed Computing (CDC), an optimal coding scheme that leverages increased computation to significantly reduce communication load in distributed systems.
Findings
The CDC scheme achieves the theoretical lower bound on communication load.
Applying CDC to Hadoop TeraSort yields nearly 2 to 3.4 times speedup.
The paper precisely characterizes the computation-communication tradeoff in distributed computing.
Abstract
How can we optimally trade extra computing power to reduce the communication load in distributed computing? We answer this question by characterizing a fundamental tradeoff between computation and communication in distributed computing, i.e., the two are inversely proportional to each other. More specifically, a general distributed computing framework, motivated by commonly used structures like MapReduce, is considered, where the overall computation is decomposed into computing a set of "Map" and "Reduce" functions distributedly across multiple computing nodes. A coded scheme, named "Coded Distributed Computing" (CDC), is proposed to demonstrate that increasing the computation load of the Map functions by a factor of (i.e., evaluating each function at carefully chosen nodes) can create novel coding opportunities that reduce the communication load by the same factor. An…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Caching and Content Delivery
