Vision Paper: Towards an Understanding of the Limits of Map-Reduce Computation
Foto N. Afrati, Anish Das Sarma, Semih Salihoglu, Jeffrey D. Ullman

TL;DR
This paper explores the fundamental limits of Map-Reduce computation, aiming to characterize which problems are efficiently solvable and which are inherently hard due to their distributability properties.
Contribution
It introduces a conceptual framework to understand the inherent computational limits of Map-Reduce and identifies properties that influence problem distributability.
Findings
Some problems have provable lower bounds for Map-Reduce efficiency.
Distributability depends on the problem's ability to be partitioned into independent subproblems.
The paper proposes a vision for characterizing problem complexity in distributed data processing.
Abstract
A significant amount of recent research work has addressed the problem of solving various data management problems in the cloud. The major algorithmic challenges in map-reduce computations involve balancing a multitude of factors such as the number of machines available for mappers/reducers, their memory requirements, and communication cost (total amount of data sent from mappers to reducers). Most past work provides custom solutions to specific problems, e.g., performing fuzzy joins in map-reduce, clustering, graph analyses, and so on. While some problems are amenable to very efficient map-reduce algorithms, some other problems do not lend themselves to a natural distribution, and have provable lower bounds. Clearly, the ease of "map-reducability" is closely related to whether the problem can be partitioned into independent pieces, which are distributed across mappers/reducers. What…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGraph Theory and Algorithms · Data Management and Algorithms · Cloud Computing and Resource Management
