Fine-Tuning Data Structures for Analytical Query Processing
Amir Shaikhha, Marios Kelepeshis, Mahdi Ghorbani

TL;DR
This paper presents a framework that automatically selects data structures for analytical query processing by combining a novel low-level language, machine learning, and static analysis, resulting in performance comparable to or better than existing systems.
Contribution
It introduces a new low-level intermediate language for query algorithms and a combined machine learning and static analysis cost model for data structure selection.
Findings
The framework's cost model effectively predicts performance on micro benchmarks.
Generated code outperforms or matches state-of-the-art query engines.
Demonstrates practical benefits in analytical workloads.
Abstract
We introduce a framework for automatically choosing data structures to support efficient computation of analytical workloads. Our contributions are twofold. First, we introduce a novel low-level intermediate language that can express the algorithms behind various query processing paradigms such as classical joins, groupjoin, and in-database machine learning engines. This language is designed around the notion of dictionaries, and allows for a more fine-grained choice of its low-level implementation. Second, the cost model for alternative implementations is automatically inferred by combining machine learning and program reasoning. The dictionary cost model is learned using a regression model trained over the profiling dataset of dictionary operations on a given hardware architecture. The program cost model is inferred using static program analysis. Our experimental results show the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Machine Learning and Data Classification · Mass Spectrometry Techniques and Applications
