Upper and Lower Bounds on the Cost of a Map-Reduce Computation
Foto N. Afrati, Anish Das Sarma, Semih Salihoglu, Jeffrey D. Ullman

TL;DR
This paper establishes theoretical bounds on the communication costs in map-reduce computations, analyzing the tradeoff between parallelism and communication for various problems, and providing tight bounds for some cases.
Contribution
It introduces a generic model for analyzing communication bounds in single-round map-reduce problems and applies it to derive tight bounds for specific computational problems.
Findings
Exact bounds for Hamming distance 1 problem
Approximate bounds for triangle detection in graphs
Matching bounds for matrix multiplication in one round
Abstract
In this paper we study the tradeoff between parallelism and communication cost in a map-reduce computation. For any problem that is not "embarrassingly parallel," the finer we partition the work of the reducers so that more parallelism can be extracted, the greater will be the total communication between mappers and reducers. We introduce a model of problems that can be solved in a single round of map-reduce computation. This model enables a generic recipe for discovering lower bounds on communication cost as a function of the maximum number of inputs that can be assigned to one reducer. We use the model to analyze the tradeoff for three problems: finding pairs of strings at Hamming distance , finding triangles and other patterns in a larger graph, and matrix multiplication. For finding strings of Hamming distance 1, we have upper and lower bounds that match exactly. For triangles…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplexity and Algorithms in Graphs · Graph Theory and Algorithms · Parallel Computing and Optimization Techniques
