Computing Marginals Using MapReduce
Foto Afrati, Shantanu Sharma, Jeffrey D. Ullman, Jonathan R. Ullman

TL;DR
This paper explores how to efficiently compute data-cube marginals of fixed order using a single round of MapReduce by analyzing the trade-off between reducer size and replication rate, and proposing optimal covering strategies.
Contribution
It introduces a novel perspective linking the problem to covering numbers and provides constructions that approach optimal replication rates for various parameters.
Findings
Minimized replication rate when reducers cover all inputs for one marginal.
Connected the problem to covering numbers in combinatorics.
Provided constructions close to the theoretical minimum replication rate.
Abstract
We consider the problem of computing the data-cube marginals of a fixed order (i.e., all marginals that aggregate over dimensions), using a single round of MapReduce. The focus is on the relationship between the reducer size (number of inputs allowed at a single reducer) and the replication rate (number of reducers to which an input is sent). We show that the replication rate is minimized when the reducers receive all the inputs necessary to compute one marginal of higher order. That observation lets us view the problem as one of covering sets of dimensions with sets of a larger size , a problem that has been studied under the name "covering numbers." We offer a number of constructions that, for different values of and meet or come close to yielding the minimum possible replication rate for a given reducer size.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
