Assignment of Different-Sized Inputs in MapReduce
Foto Afrati, Shlomi Dolev, Ephraim Korach, Shantanu Sharma, Jeffrey D., Ullman

TL;DR
This paper investigates the problem of assigning inputs of varying sizes to reducers in MapReduce, aiming to minimize communication costs while satisfying input meeting requirements, and provides approximation algorithms for NP-hard cases.
Contribution
It introduces the first consideration of input sizes in MapReduce mapping schemas and develops approximation algorithms for related NP-hard problems.
Findings
Proved the NP-hardness of optimal input assignment with size constraints.
Developed approximation algorithms for near-optimal solutions.
Analyzed the impact of input size considerations on communication costs.
Abstract
A MapReduce algorithm can be described by a mapping schema, which assigns inputs to a set of reducers, such that for each required output there exists a reducer that receives all the inputs that participate in the computation of this output. Reducers have a capacity, which limits the sets of inputs that they can be assigned. However, individual inputs may vary in terms of size. We consider, for the first time, mapping schemas where input sizes are part of the considerations and restrictions. One of the significant parameters to optimize in any MapReduce job is communication cost between the map and reduce phases. The communication cost can be optimized by minimizing the number of copies of inputs sent to the reducers. The communication cost is closely related to the number of reducers of constrained capacity that are used to accommodate appropriately the inputs, so that the requirement…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Advanced Database Systems and Queries · Data Mining Algorithms and Applications
