Assignment Problems of Different-Sized Inputs in MapReduce
Foto Afrati, Shlomi Dolev, Ephraim Korach, Shantanu Sharma, Jeffrey D., Ullman

TL;DR
This paper investigates input assignment problems in MapReduce where input sizes vary, proving NP-hardness and proposing a bin-packing-based approximation algorithm to optimize communication costs.
Contribution
It introduces the first consideration of input sizes in MapReduce mapping schemas and provides an approximation algorithm for related NP-hard problems.
Findings
Proves NP-hardness of input meeting problems in MapReduce.
Develops a bin-packing-based approximation algorithm.
Offers near-optimal solutions for input assignment problems.
Abstract
A MapReduce algorithm can be described by a mapping schema, which assigns inputs to a set of reducers, such that for each required output there exists a reducer that receives all the inputs that participate in the computation of this output. Reducers have a capacity, which limits the sets of inputs that they can be assigned. However, individual inputs may vary in terms of size. We consider, for the first time, mapping schemas where input sizes are part of the considerations and restrictions. One of the significant parameters to optimize in any MapReduce job is communication cost between the map and reduce phases. The communication cost can be optimized by minimizing the number of copies of inputs sent to the reducers. The communication cost is closely related to the number of reducers of constrained capacity that are used to accommodate appropriately the inputs, so that the requirement…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGraph Theory and Algorithms · Advanced Database Systems and Queries · Data Management and Algorithms
