Coded MapReduce
Songze Li, Mohammad Ali Maddah-Ali, A. Salman Avestimehr

TL;DR
Coded MapReduce introduces a coding strategy to significantly reduce inter-server communication during the shuffling phase, accelerating distributed data processing.
Contribution
It presents a novel coding-based approach that reduces communication load in MapReduce, achieving near-optimal efficiency and analyzing the tradeoff between computation and communication.
Findings
Reduces inter-server communication load by a factor proportional to the number of servers.
Achieves near-minimum communication load within a constant factor.
Provides analysis of the computation-communication tradeoff in Coded MapReduce.
Abstract
MapReduce is a commonly used framework for executing data-intensive jobs on distributed server clusters. We introduce a variant implementation of MapReduce, namely "Coded MapReduce", to substantially reduce the inter-server communication load for the shuffling phase of MapReduce, and thus accelerating its execution. The proposed Coded MapReduce exploits the repetitive mapping of data blocks at different servers to create coding opportunities in the shuffling phase to exchange (key,value) pairs among servers much more efficiently. We demonstrate that Coded MapReduce can cut down the total inter-server communication load by a multiplicative factor that grows linearly with the number of servers in the system and it achieves the minimum communication load within a constant multiplicative factor. We also analyze the tradeoff between the "computation load" and the "communication load" of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Stochastic Gradient Optimization Techniques · IoT and Edge/Fog Computing
