Communication Steps for Parallel Query Processing
Paul Beame, Paraschos Koutris, Dan Suciu

TL;DR
This paper investigates the minimum number of communication rounds needed for parallel relational query processing under data transfer constraints, establishing bounds and tradeoffs for different query classes.
Contribution
It provides the first lower bounds for multi-round communication in parallel query processing and characterizes the tradeoff between rounds and space for tree-like queries.
Findings
Single-round lower bounds depend on the fractional vertex cover of the query hypergraph.
Matching algorithms are provided for specific database classes.
Transitive closure cannot be computed in constant rounds under the model.
Abstract
We consider the problem of computing a relational query on a large input database of size , using a large number of servers. The computation is performed in rounds, and each server can receive only bits of data, where is a parameter that controls replication. We examine how many global communication steps are needed to compute . We establish both lower and upper bounds, in two settings. For a single round of communication, we give lower bounds in the strongest possible model, where arbitrary bits may be exchanged; we show that any algorithm requires , where is the fractional vertex cover of the hypergraph of . We also give an algorithm that matches the lower bound for a specific class of databases. For multiple rounds of communication, we present lower bounds in a model where routing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Data Management and Algorithms · Distributed systems and fault tolerance
