Parallelizing Query Optimization on Shared-Nothing Architectures
Immanuel Trummer, Christoph Koch

TL;DR
This paper introduces scalable algorithms for parallel query optimization on shared-nothing architectures, significantly reducing optimization time by dividing the plan space among multiple workers with minimal synchronization.
Contribution
It presents novel parallel algorithms for query optimization that efficiently utilize large clusters without extensive data exchange or synchronization.
Findings
Achieved up to 10x speedup on 100-node clusters.
Parallelization scales linearly with the number of workers and query size.
Effective for large queries with long optimization times.
Abstract
Data processing systems offer an ever increasing degree of parallelism on the levels of cores, CPUs, and processing nodes. Query optimization must exploit high degrees of parallelism in order not to gradually become the bottleneck of query evaluation. We show how to parallelize query optimization at a massive scale. We present algorithms for parallel query optimization in left-deep and bushy plan spaces. At optimization start, we divide the plan space for a given query into partitions of equal size that are explored in parallel by worker nodes. At the end of optimization, each worker returns the optimal plan in its partition to the master which determines the globally optimal plan from the partition-optimal plans. No synchronization or data exchange is required during the actual optimization phase. The amount of data sent over the network, at the start and at the end of optimization,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Data Management and Algorithms · Graph Theory and Algorithms
