Some Pairs Problems
Jeffrey D. Ullman, Jonathan Ullman

TL;DR
This paper analyzes the limitations of existing MapReduce algorithms for some-pairs problems, introduces a recursive algorithm, and proposes heuristics to improve performance on typical instances.
Contribution
It establishes lower bounds for general algorithms and presents a recursive approach with heuristics to outperform these bounds on common cases.
Findings
No general-purpose MapReduce algorithm outperforms the two basic approaches in the worst case.
A recursive algorithm can effectively solve some-pairs problems.
Heuristics can beat the lower bound on typical instances.
Abstract
A common form of MapReduce application involves discovering relationships between certain pairs of inputs. Similarity joins serve as a good example of this type of problem, which we call a "some-pairs" problem. In the framework of Afrati et al. (VLDB 2013), algorithms are measured by the tradeoff between reducer size (maximum number of inputs a reducer can handle) and the replication rate (average number of reducers to which an input must be sent. There are two obvious approaches to solving some-pairs problems in general. We show that no general-purpose MapReduce algorithm can beat both of these two algorithms in the worst case. We then explore a recursive algorithm for solving some-pairs problems and heuristics for beating the lower bound on common instances of the some-pairs class of problems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Database Systems and Queries · Graph Theory and Algorithms
