Building Efficient Query Engines in a High-Level Language
Amir Shaikhha, Yannis Klonatos, Christoph Koch

TL;DR
This paper presents LegoBase, a high-level Scala-based query engine that uses generative programming to optimize performance, achieving significant speedups over traditional systems while maintaining high productivity and ease of extension.
Contribution
Introducing LegoBase, a novel approach that applies source-to-source compilation in a high-level language to create efficient, customizable query engines with minimal low-level coding.
Findings
LegoBase outperforms commercial databases and existing query compilers on TPC-H.
Only a few hundred lines of high-level code are needed for optimizations.
Compilation overhead is low relative to execution time.
Abstract
Abstraction without regret refers to the vision of using high-level programming languages for systems development without experiencing a negative impact on performance. A database system designed according to this vision offers both increased productivity and high performance, instead of sacrificing the former for the latter as is the case with existing, monolithic implementations that are hard to maintain and extend. In this article, we realize this vision in the domain of analytical query processing. We present LegoBase, a query engine written in the high-level language Scala. The key technique to regain efficiency is to apply generative programming: LegoBase performs source-to-source compilation and optimizes the entire query engine by converting the high-level Scala code to specialized, low-level C code. We show how generative programming allows to easily implement a wide spectrum…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Database Systems and Queries · Distributed and Parallel Computing Systems
