When Distributed Computation is Communication Expensive
David P. Woodruff, Qin Zhang

TL;DR
This paper investigates the communication complexity of distributed statistical and graph problems, demonstrating that exact solutions often require prohibitively high communication costs, and highlighting the necessity of approximation or data layout strategies.
Contribution
It provides fundamental lower bounds on communication for exact solutions in distributed models and emphasizes the importance of approximation or data distribution for efficiency.
Findings
Exact computation often requires high communication costs.
Simple data aggregation protocols are near-optimal for exact solutions.
Approximation or data layout strategies are essential for communication efficiency.
Abstract
We consider a number of fundamental statistical and graph problems in the message-passing model, where we have machines (sites), each holding a piece of data, and the machines want to jointly solve a problem defined on the union of the data sets. The communication is point-to-point, and the goal is to minimize the total communication among the machines. This model captures all point-to-point distributed computational models with respect to minimizing communication costs. Our analysis shows that exact computation of many statistical and graph problems in this distributed setting requires a prohibitively large amount of communication, and often one cannot improve upon the communication of the simple protocol in which all machines send their data to a centralized server. Thus, in order to obtain protocols that are communication-efficient, one has to allow approximation, or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplexity and Algorithms in Graphs · Stochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data
