A Learned Cost Model-based Cross-engine Optimizer for SQL Workloads
Andr\'as Strausz, Niels Pardon, Ioana Giurgiu

TL;DR
This paper introduces a learned cost model-based cross-engine optimizer that automates SQL engine selection, improving query performance and reducing workload runtime across multiple databases and engines.
Contribution
It proposes a multi-task learning-based cost prediction model that simplifies engine addition and enhances query routing accuracy in Lakehouse systems.
Findings
Reduces average Q-error by 12.6% with optimized plans.
Decreases total workload runtime by up to 30.4%.
Enables flexible addition of new engines with minimal fine-tuning.
Abstract
Lakehouse systems enable the same data to be queried with multiple execution engines. However, selecting the engine best suited to run a SQL query still requires a priori knowledge of the query computational requirements and an engine capability, a complex and manual task that only becomes more difficult with the emergence of new engines and workloads. In this paper, we address this limitation by proposing a cross-engine optimizer that can automate engine selection for diverse SQL queries through a learned cost model. Optimized with hints, a query plan is used for query cost prediction and routing. Cost prediction is formulated as a multi-task learning problem, and multiple predictor heads, corresponding to different engines and provisionings, are used in the model architecture. This eliminates the need to train engine-specific models and allows the flexible addition of new engines at a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Cloud Computing and Resource Management · Distributed and Parallel Computing Systems
